• Remediating AWS IMDSv1

    Heya! We’re in the process of reworking our blog–sorry for the radio silence. That’s not done yet, but in the mean while we wrote an article on remediating IMDSv1 in AWS, a common server-side request forgery vector targeting lateral movement and persistence. You can read it on Google Docs here.

    (You should expect a few pretty exciting announcements soon :))

  • The SOC2 Starting Seven

    So, you plan to sell your startup’s product to big companies one day. Congratu-dolences!

    Really, that’s probably the only reason you should care about this article. If that’s not you, go forth and live your life! We’ll ask no more of your time.

    For the rest of you: Industry people talk about SOC2 a lot, and it’s taken on a quasi-mystical status, not least because it’s the product of the quasi-mystical accounting industry. But what it all boils down to is: eventually you’ll run into big-company clients demanding a SOC2 report to close a sale. You know this and worry about it.

    Here’s how we’ll try to help: with Seven Things you can do now that will simplify SOC2 for you down the road while making your life, or at least your security posture, materially better in the immediacy.

    We’ll cut to the chase and list them out, and then dig into explanations.

    #1: Single Sign-On

    You’re going to sign up for Okta or Google Cloud Identity, tie as many of your applications (first- and third-party) into it, and force 2FA.

    #2: PRs, Protected Branches, and CI/CD

    You’re already doing this, but just to be sure: you’re going to lock your deploy branch and require PRs approval to merge to it, and you’re going to automate deployment.

    #3: Centralized Logging

    You’re going to pick a logging service with alerting and use it for as close to everything as possible.

    #4: Terraform Or Something

    You’re going to do all your cloud provisioning with something like Terraform and keep the configs in Github, using the same PR process as you do for code.

    #5: CloudTrail And AssumeRole

    You’re going to set up CloudTrail logs and require your team to use AssumeRole to get to anything interesting in your AWS configuration.

    #6: MDM

    You’re going to pick an MDM system – it’s probably going to be Jamf Pro – and install it on all your desktops and laptops, and then use it to make sure everyone’s got encrypted disks and up-to-date patches.

    #7: VendorSec

    You’re going to start tracking all the software you subscribe to, buy, or install in a spreadsheet and start doing some simple risk tracking.

    You do these Seven Things, then write some basic policy documentation, and you’re at approximately the maturity level of the firms we’ve talked to or worked with that are SOC2-certified. Since these are all things you should be doing anyways, you might as well do them early, when they’re easy to roll out. We’ve been through the “retrofit Single Sign On” or “change the way all the logs are collected 3 years in” exercises, and they are avoidable nightmares, so avoid them.

    Now, before we get into the details of the Seven Things, some thoughts about SOC2.

    First: for us, SOC2 is about sales. You will run into people with other ideas of what SOC2 is about. Example: “SOC2 will help you get your security house in order and build a foundation for security engineering”. No. Go outside, turn around three times, and spit. Compliance is a byproduct of security engineering. Good security engineering has little to do with compliance. And SOC2 is not particularly good. So keep the concepts separate.

    This is not an instruction guide for getting SOC2-certified. Though: that guide would be mercifully short: “find $15,000, talk to your friends who have gotten certified, get referred to the credible auditor that treated your friends the best, and offer them the money, perhaps taped to the boom box playing Peter Gabriel you hold aloft outside their offices”.

    If there is one thing to understand about SOC2 audits, it’s: SOC2 is about documentation, not reality. SOC2 audits are performed by accountants, not pentesters. You’ll tell your audit team what security things you try to do. They’ll call upon the four cardinal directions of ontology in a ceremony of shamanic accountancy. They’ll tell you those security things are just fine. Then they’ll give you a 52,000-line questionnaire called the Information Request List (IRL), based in some occult way on what you told them you’re doing. And you’ll fill it out. You’ll have a few meetings and then write them a check. They’ll put your company name on a report.

    We are close to a point where we can offer you some practical SOC2 advice, but before we can, you need two more concepts.

    See, there’s two kinds of SOC2 audits startups undergo: the Type 1 and Type 2. The subject matter of both audits is the same. The difference is that Type 1 audits set a standard for your company, and Type 2 audits measure how well you complied with that standard. Type 1 audits are the “easy” audits and the only ones many startups care about at first.

    And, there are several different SOC2 audit “scopes”. You should focus on “security” and “confidentiality”. A SOC2 audit is going to involve a bunch of business documentation, no matter how you set it up. We don’t care about that stuff here; we’re security engineers talking to other engineers. So we’re just going to focus on the security details; and, anyways, security is all your big-company sales prospects care about when they ask for SOC2 reports.

    Now, the practical advice: your Type 1 audit sets a bar for your future Type 2 audits. Eventually, you will meet an angry bear in the woods who will withhold your pilot project purchase order until you produce a Type 2. When that day comes, you will likely be happier if you did things in your Type 1 to make the Type 2 simpler. It is easy to add new security controls after your Type 1. To an auditor, more controls are more good. It is not as easy to remove security controls from your Type 1. The auditors may ask why you did that.

    The obvious conclusion is that you should consider minimizing, within reason, the security controls you choose to surface in your Type 1.

    At any rate, if you’re eventually going to SOC2, there’s an easy way to judge the ROI on security tasks: you want things that (1) are generally valuable in their own right, (2) things you can keep doing indefinitely, and (3) generate lots of evidence for the IRL.

    Single Sign-On

    Set up Okta. Link all your third party applications to it. Turn on mandatory 2FA. Disable non-Okta sign-ins on as many of the applications you can. Now write a short onboarding/offboarding script and keep a spreadsheet recording who you’ve offboarded.

    It doesn’t have to be Okta. It can be GSuite, or, if you’re a glutton for punishment, you can even run your own Shibboleth instance. What’s important is that there’s just one of them, and that as many of your applications as possible are linked to it.

    We would tell you to do this even if you weren’t SOC2’ing. You need an inventory of your applications, a single place to disable access for users, and a way to force 2FA in as many places as possible. The alternative is madness. Every CSO we’ve asked has SSO in their top 5 “first things they’d do at a new company”.

    We’re aware: this costs money. Okta will charge you per-user. They might even charge more for 2FA. And your SAAS subscriptions will tax you for asking for SSO. This is annoying. But the money probably isn’t material, and you should just spend it.

    As to why it matters for SOC2: it will probably address dozens of “Access Control” line items in the IRL. And “Access Control” is most of the audit. A big chunk of the meetings you take with your auditors might just be walking through your Okta setup, and many of them (like password lockout) are things your SSO will configure properly by default.

    PRs, Protected Branches, and CI/CD

    Enable Protected Branches for your master/deployment branches in Github. Set up a CI system and run all your deploys through it. Require reviews for PRs that merge to production. Make your CI run some tests; it probably doesn’t much matter to your auditor which tests.

    We probably didn’t even need to tell you this. It’s how most professional engineering shops already run. But if you’re a 3-person team, get it set up now and you won’t have to worry about it when your team sprawls and new process is a nightmare to establish.

    If most of the SOC2 security audit is about Access Control, most of the rest of it is about “Change Management”. The sane way to address Change Management is just to get everything into Github, and then use Github features.

    An easy bonus point you can pick up here: write a simple PR security checklist. “PRs need an extra sign-off if they involve changing any SQL queries” is an example of what might go on that checklist. Or, “extra review if this PR calls dangerouslySetInnerHTML”. It doesn’t matter how detailed the checklist is now; you just want to get the process started. You’ll learn (from your mistakes) what belongs on it over the long haul.

    Centralized Logging

    Sign up for or set up a centralized logging service. It doesn’t matter which. Pipe all your logs to it. Set up some alerts – again, at this stage, it doesn’t matter which; you just want to build up the muscle.

    Over the long term, a lot of your security monitoring is going to involve log munching. There are things you’ll want to build, like customer fraud detection, that will be much simpler if you can set up a log-grepping alert that just spits things onto a Slack channel. I feel like I shouldn’t need to sell anyone on centralized logging, but, here I am doing that.

    If most of SOC2 is Access Control and most of the rest of it is Change Management, then most of the rest of the rest is “Monitoring of Controls”, and a flexible centralized logging system is going to answer every monitoring control item your auditor asks about.

    If you have a control check that runs regularly and produces a file or some JSON, you can also just put it in S3, and then go turn versioning on. Presto: a fossil record. We do this for a lot of Latacora’s controls and auditors seem to love it. Does your policy say you’ll review it every X time? Great, have a separate CloudTrail with less retention with S3 read events.

    Terraform or Something

    I’m assuming you’re using AWS, but you can transliterate to GCP in your head; it’s all the same.

    The AWS console is evil. Avoid the AWS console.

    Instead, deploy everything with Terraform (or whatever the cool kids are using, but it’s probably still just Terraform). Obviously, keep your Terraform configs in Github.

    You want all your infrastructure changes tracked, documented, and to be repeatable. This is why people use Terraform in the first place. Even for spare-time solo projects, Terraform is worth it. It will gross you out at times; it grosses everyone out, and they still use it, because it’s better than not having it.

    Of course, what we’re doing here for SOC2 is ensuring that your infrastructure work can draft off the Access Control and Change Management your development work already uses.

    CloudTrail and AssumeRole

    Turn on CloudTrail. Use IAM accounts exclusively. Minimize the access you give directly to users, and use role assumption wherever possible. Users will call assume-role to get an AWS session that can do anything meaningful in AWS.

    You now have an audit log for engineering changes to AWS that in some sense signals the intent the engineer had when making the change. You also have a straightforward way to review the permissions each engineer has.

    There are bonus points to be had here as well:

    First, set up multiple AWS Accounts; make CloudTrail logs from your sub-accounts feed into a parent account to which you grant limited access. The parent account doesn’t do much and is unlikely to be compromised. The sub-accounts do lots of random things and are where an AWS compromise is likely to actually happen. You’re now resilient to some of what might happen in a compromise.

    Second, force AWS MFA, and get your engineering team to use aws-vault. aws-vault stores AWS credentials in your OS keychain and mints new sessions from them as needed. It’s actually easier to use than the AWS “credentials” file, and much more secure.

    Honestly, you probably don’t have to do all of this stuff to clear a SOC2 audit. We’ve seen organizations where everyone just has AdministratorAccess clear audits without a problem. But that just goes to our theme: things that have high value for a startup, from the jump, and will address auditor questions. This stuff has high ROI and you should just do it.

    MDM

    You’re probably all Macbooks. So install Jamf Pro on all your laptops. Use it to require hard disk encryption and push out OS updates and whatever standard security tooling you come up with.

    Unlike the previous item, which wasn’t a requirement for SOC2, this probably is. Or rather, something like it is, and MDM (which is what Jamf is) is the friendliest way to satisfy the requirements.

    From both a security and a compliance perspective, you need a way to keep a handle on all your endpoints. And it should go without saying that all your endpoints are owned by your startup, and nobody is using their personal laptops to get to AWS or your admin consoles. A basic rule of startup security engineering is that you want to minimize the number of situations where you could be demanding an after-incident filesystem image of an employee’s personal device, because, uncomfortable.

    For SOC2 purposes, MDM will address a lot of fiddly endpoint security questions. Simple example: your OS will have basic anti-malware controls. SOC2 cares about anti-malware (it’s one of the few technical security issues SOC2 knows about). You’d rather address auditor concerns by documenting how Jamf configures everyone’s OS to enable anti-malware, than by demonstrating that you’ve installed some god-awful Mac AV program on all your desktops.

    Another bonus point: standardize everyone on Chrome. We don’t have to grumble about why; we’ll just observe that this is what almost every security team does anyways. Now you can use Jamf to inventory all the Chrome Extensions that your team is running. Also, you get to be a lot less worried about browser exploits.

    Vendorsec

    And, speaking of Chrome Extensions, our last item: standing up a vendorsec program.

    Vendorsec sounds cool and mysterious but at this stage of the game it’s really just a fancy way to refer to a spreadsheet you’re going to keep. That spreadsheet will inventory all the third party applications – SAAS subscriptions, desktop applications, Chrome Extensions, and pretty much every other piece of software you use other than software development libraries.

    As you sign up for more SAAS subscriptions, start asking for security documentation. Two reasonable things to ask for: the report from their last public penetration test, and their SOC2 report (SOC2 is infectious, like the GPL; your auditors will want to see SOC2 reports from your vendors). Collect them in a Google Drive folder.

    You want to start thinking about how you’re going to vet vendors. For instance: you can come up with some basic rules about Chrome Extensions; “if it gets access to the DOM, it has to be run in its own Chrome Profile”. Or for desktop applications: “if it binds a local port, that port has to be covered in the machine’s firewall rules”. This is all good stuff and you should do as much of it as you can; you’ll thank yourself later.

    But to start with, you just want to be documenting things. Your auditor’s IRL will absolutely ask you for evidence about which third-party applications you’ve signed up for, and how you made decisions about them. Your vendorsec spreadsheet will also be a pretty valuable part of your “Risk Assessment”, which is itself just a big spreadsheet.

    There are other reasonable security engineering things you could get an early start on, and many of them would play well in an especially diligent SOC2 audit. For instance:

    VPN to Admin Console and SSH

    You could put your admin consoles behind a VPN, link the VPN to your SSO system, and then nobody gets any admin rights anywhere without going through SSO.

    Vulnerability Management

    You could start tracking the alerts Github gives you for vulnerable dependencies. Most won’t apply to you even if you run the affected component; track them anyways in a spreadsheet or something. Do some dry-run base image updates.

    Pentesting

    You could get a third-party penetration test. Any test will count as far as auditors are concerned. You can spend more to get an assessment that will actually matter for your team and find real bugs.

    Security Policy

    You could get boilerplate security policies and an employee handbook (your payroll provider may be able to give you a handbook). Most of the policies don’t matter in their particulars (it’s more important to have them than to have them be precisely tuned), but the Incident Response policy does matter; review and customize it. Get the policies and handbook signed by employees.

    Incident Response

    Any time anything security relevant happens, you could make a private Slack channel named for the incident. The most common incident will be “employee lost phone” or something, and the response will be “didn’t matter because phone was encrypted and access was revoked and logs were reviewed”.

    Access Reviews

    You could start holding a monthly meeting, recorded in a spreadsheet or whatever, reviewing everyone’s access to things (hopefully this is just a walk of your SSO config and IAM role permissions).

    This is all good stuff, and may be helpful while you’re getting audited (though again remember not to set a bar for yourself you don’t want to clear next year). But you don’t have to do all of these just to get a Type 1 Report.

    There are security things you can do that are great, but aren’t going to matter at all for SOC2. Do them when they make sense. Don’t do them because you think you have to in order to get SOC2’d.

    • An SSH CA: The cool kids use CAs instead of ad-hoc SSH keypairs to gate access to servers. This generates evidence, demonstrates control, and streamlines engineering. SSH CA’s are great. But almost every SOC2’d company doesn’t have one.

    • Customer-Facing Security Features: If it makes sense, MFA for all customers is a great feature, but it’s not something you’re likely to get dinged for by auditors.

    • Host Hardening: You should harden your hosts, set up auditd or EBPF perf events, and tune up a seccomp-bpf profile. But your auditor won’t even know what you’re talking about when you try to explain this stuff.

    • Flow Logs: AWS will give you fine-grained connection logs of all the traffic between your instances. They cost money. If you have an actual plan for what you’re going to do with those logs so that they’re actually helpful to you, enable them. But auditors don’t know what flows are.

    There are security engineering things that are bad, and, like capitalist and communist and lots of things they’ve heard about, your SOC2 auditor might bring up.

    Here it’s worth remembering the basic idea behind a SOC2: you are going to tell your auditors the security things you do. They are going to enact a sacred ritual written on vellum in blood on the windy heath where Coopers first met Lybrand, and then give you a questionnaire to fill out. When your auditor asks about these things, or puts them on your IRL, you are allowed to say “we don’t do that thing, and here is why”. You know more than they do about security engineering.

    Some of those things include:

    • IDS/IPS: Don’t install an IDS. You have logs and MDM and security groups and the evidence they generate will satisfy this concern. AWS GuardDuty pricing is ridiculous but, if worse comes to worst, it will emphatically satisfy this if someone complains. Your SOC2 auditor probably won’t, but some fussbudgety Fortune 500 compliance person will. Or, use the scrappy startup version: something like StreamAlert: much cheaper, a little less turnkey, and fussbudget reviewers can’t tell the two apart. Either way: not a priority for SOC2.

    • Firewalls: You’re in AWS; it’s built in and, in the common case, probably default-deny already.

    • Web Application Firewalls: Most SOC2’d firms don’t use them, and most WAFs don’t work at all.

    • Endpoint Protection and AV: You have MDM and macOS/Windows full complement of security features and a way to force them enabled, you’re covered. AV software is a nightmare; avoid it while you can.

    • Data Loss Prevention: You keep production data in your prod environment and document the cases where it leaves. You therefore have better DLP than Fortune 500 companies with expensive DLP tools. The DLP tools are awful; avoid them.

    • Threat Management Tools: Most of the people who tell you that threat feeds are important for SOC2 are selling threat feeds. Most of the rest are rationalizing a threat tool they purchased. Let that be their problem.

    • Risk Management Platforms: Risk management tools are essentially the SOC2 IRL made flesh. If you derive pleasure from filling out questionnaires, don’t let us yuck your yum. But you don’t need anything more formal than a spreadsheet and a git history to track the things that SOC2 auditors are going to ask about.

  • Stop Using Encrypted Email

    Email is unsafe and cannot be made safe. The tools we have today to encrypt email are badly flawed. Even if those flaws were fixed, email would remain unsafe. Its problems cannot plausibly be mitigated. Avoid encrypted email.

    Technologists hate this argument. Few of them specialize in cryptography or privacy, but all of them are interested in it, and many of them tinker with encrypted email tools.

    Most email encryption on the Internet is performative, done as a status signal or show of solidarity. Ordinary people don’t exchange email messages that any powerful adversary would bother to read, and for those people, encrypted email is LARP security. It doesn’t matter whether or not these emails are safe, which is why they’re encrypted so shoddily.

    But we have to consider more than the LARP cases. In providing encryption, we have to assume security does matter. Messages can be material to a civil case and subject to discovery. They can be subpoenaed in a law enforcement action. They safeguard life-altering financial transactions. They protect confidential sources. They coordinate resistance to oppressive regimes. It’s not enough, in these cases, to be “better than no encryption”. Without serious security, many of these messages should not be sent at all.

    The least interesting problems with encrypted email have to do with PGP. PGP is a deeply broken system. It was designed in the 1990s, and in the 20 years since it became popular, cryptography has advanced in ways that PGP has not kept up with. So, for example, it recently turned out to be possible for eavesdroppers to decrypt messages without a key, simply by tampering with encrypted messages. Most technologists who work with PGP don’t understand it at a low enough level to see what’s wrong with it. But that’s a whole other argument. Even after we replace PGP, encrypted email will remain unsafe.

    Here’s why.

    .

    If messages can be sent in plaintext, they will be sent in plaintext.

    Email is end-to-end unencrypted [1] by default. The foundations of electronic mail are plaintext. All mainstream email software expects plaintext. In meaningful ways, the Internet email system is simply designed not to be encrypted.

    The clearest example of this problem is something every user of encrypted email has seen: the inevitable unencrypted reply. In any group of people exchanging encrypted emails, someone will eventually manage to reply in plaintext, usually with a quoted copy of the entire chain of email attached. This is tolerated, because most people who encrypt emails are LARPing. But in the real world, it’s an irrevocable disaster.

    Even if modern email tools didn’t make it difficult to encrypt messages, the Internet email system would still be designed to expect plaintext. It cannot enforce encryption. Unencrypted email replies will remain an ever-present threat.

    Serious secure messengers foreclose on this possibility. Secure messengers are encrypted by default; in many of the good ones, there’s no straightforward mechanism to send an unsafe message at all. This is table stakes.

    .

    Metadata is as important as content, and email leaks it.

    Leave aside the fact that the most popular email encryption tool doesn’t even encrypt subject lines, which are message content, not metadata.

    The email “envelope” that includes the sender, the recipient, and timestamps – is unencrypted and always will be. Court cases (and lists of arrest targets) have been won or lost on little more than this. Internet email creates a durable log of metadata, one that every serious adversary is already skilled at accessing.

    The most popular modern secure messaging tool is Signal, which won the Levchin Prize at Real World Cryptography for its cryptographic privacy design. Signal currently requires phone numbers for all its users. It does this not because Signal wants to collect contact information for its users, but rather because Signal is allergic to it: using phone numbers means Signal can piggyback on the contact lists users already have, rather than storing those lists on its servers. A core design goal of the most important secure messenger is to avoid keeping a record of who’s talking to whom.

    Not every modern secure messenger is as conscientious as Signal. But they’re all better than Internet email, which doesn’t just collect metadata, but actively broadcasts it. Email on the Internet is a collaboration between many different providers; and each hop on its store-and-forward is another point at which metadata is logged.

    .

    Every archived message will eventually leak.

    Most people email using services like Google Mail. One of the fundamental features of modern email is search, which is implemented by having the service provider keep a plaintext archive of email messages. Of the people who don’t use services like Google Mail, the majority use email client software that itself keeps a searchable archive. Ordinary people have email archives spanning years.

    Searchable archives are too useful to sacrifice, but for secure messaging, archival is an unreasonable default. Secure messaging systems make arrangements for “disappearing messages”. They operate from the premise that their users will eventually lose custody of their devices. Ask Ross Ulbricht why this matters.

    No comparable feature exists in email. Some email clients have obscure tools for automatically pruning archives, but there’s no way for me to reliably signal to a counterparty that the message I’m about to send should not be retained for more than 30 minutes. In reality, any email I send is likely to be archived forever. No matter how good a job one does securing their own data, their emails are always at the mercy of the least secure person they’ve sent them to.

    Tangent: the adoption of web mail services drastically reduces the security it can plausibly provide. For encryption to protect users, it must be delivered “end to end”, with encryption established directly between users, not between users and their mail server. There are, of course, web email services that purport to encrypt messages. But they store encryption keys (or code and data sufficient to derive them). These systems obviously don’t work, as anyone with an account on Ladar Levison’s Lavabit mail service hopefully learned. The popularity of “encrypted” web mail services is further evidence of encrypted email’s real role as a LARPing tool.

    .

    Every long term secret will eventually leak.

    Forward secrecy is the property that a cryptographic key that is compromised in the future can’t easily be used to retroactively decrypt all previous messages. To accomplish this, we want two kinds of keys: an “identity” key that lives for weeks or months and “ephemeral” keys that change with each message. The long-lived identity key isn’t used to encrypt messages, but rather to establish the ephemeral keys. Compromise my identity key and you might read messages I send in the future, but not the ones I’ve sent in the past.

    Different tools do better and worse jobs of forward secrecy, but nothing does worse than encrypted Internet email, which not only demands of users that they keep a single long-term key, but begs them to publish those keys in public ledgers. Every new device a user of these systems buys and every backup they take is another opportunity for total compromise. Users are encouraged to rotate their PGP keys in the same way that LARPers are encouraged to sharpen their play swords: not only does nobody do it, but the whole system would probably fall apart if everyone did.

    .

    Technologists are clever problem solvers and these arguments are catnip to software developers. Would it be possible to develop a version of Internet email that didn’t have some of these problems? One that supported some kind of back-and-forth messaging scheme that worked in the background to establish message keys? Sure. But that system wouldn’t be Internet email. It would, at best, be a new secure messaging system, tunneled through and incompatible with all mainstream uses of email, only asymptotically approaching the security of the serious secure messengers we have now.

    What should people use instead?

    Real secure messaging software. The standard and best answer here is Signal, but there are others, and if the question is “should I use encrypted email or should I use a secure messenger”, we’re agnostic to which one you use. Or you can do more elaborate things. Magic Wormhole will securely exchange documents between people. age will encrypt documents that can be sent through less secure systems. These tools are all harder to use and more fraught than secure messengers, but they’re better than encrypted email.

    There are reasons people use and like email. We use email, too! It’s incredibly convenient. You can often guess people’s email addresses and communicate with them without ever being introduced. Every computing platform in the world supports it. Nobody needs to install anything new, or learn how to use a new system. Email is not going away.

    You can reasonably want email to be secure. Pray for a true peace in space! And we don’t object to email security features, like hop-by-hop TLS encryption and MTA-STS, that make the system more resistant to dragnet surveillance. But email cannot promise security, and so shouldn’t pretend to offer it. We need clarity about what kinds of systems are worthy of carrying secrets and which aren’t, or we end up with expert-run news publications with mail spools full of archived messages, many presumably from sources, along with a roadmap to all the people who sent those messages and upon whose operational security competence their safety depends. And that’s the best case.

    Stop using encrypted email.

    [1] End-to-end encryption, again, is the property that only the sender and the receiver need to trust each other, and that they need not trust their providers; it’s what you get when the senders and receivers generate their own keys, and those keys never leave their custody. Email is “encrypted” in a different sense, hop-by-hop, with TLS; Google will arrange with your mail server not to reveal plaintext on the wire. But Google and your mail server both get the plaintext of your messages. Hop-by-hop encryption is a good thing: it makes untargeted dragnet surveillance harder. But tools like PGP don’t make this kind of surveillance any harder, and a targeted attacker will still get access to mail servers and messages.

  • How (not) to sign a JSON object

    Last year we did a blog post on interservice auth. This post is mostly about authenticating consumers to an API. That’s a related but subtly different problem: you can probably impose more requirements on your internal users than your customers. The idea is the same though: you’re trying to differentiate between a legitimate user and an attacker, usually by getting the legitimate user to prove that they know a credential that the attacker doesn’t.

    You don’t really want a signature

    When cryptography engineers say “signature” they tend to mean something asymmetric, like RSA or ECDSA. Developers reach for asymmetric tools too often. There are a lot of ways to screw them up. By comparison, symmetric “signing” (MACs) are easy to use and hard to screw up. HMAC is bulletproof and ubiquitous.

    Unless you have a good reason why you need an (asymmetric) signature, you want a MAC. If you really do want a signature, check out our Cryptographic Right Answers post to make that as safe as possible. For the rest of this blog post, “signing” means symmetrically, and in practice that means HMAC.

    How to sign a JSON object

    1. Serialize however you want.

    2. HMAC. With SHA256? Sure, whatever. We did a blog post on that too.

    3. Concatenate the tag with the message, maybe with a comma in between for easy parsing or something.

    Wait, isn’t that basically a HS256 JWT?

    Shut up. Anyway, no, because you need to parse a header to read the JWT, so you inherit all of the problems that stem from that.

    How not to sign a JSON object, if you can help it

    Someone asked how to sign a JSON object “in-band”: where the tag is part of the object you’re signing itself. That’s a niche use case, but it happens. You have a JSON object that a bunch of intermediate systems want to read and it’s important none of them mess with its contents. You can’t just send tag || json: that may be the cryptographically right answer, but now it’s not a JSON object anymore so third party services and middleboxes will barf. You also can’t get them to reliably pass the tag around as metadata (via a HTTP header or something). You need to put the key on the JSON object, somehow, to “transparently” sign it. Anyone who cares about validating the signature can, and anyone who cares that the JSON object has a particular structure doesn’t break (because the blob is still JSON and it still has the data it’s supposed to have in all the familiar places).

    This problem sort-of reminds me of format-preserving encryption. I don’t mean that in a nice way, because there’s no nice way to mean that. Format-preserving encryption means you encrypt a credit card number and the result still sorta looks like a credit card number. It’s terrible and you only do it because you have to. Same with in-band JSON signing.

    As stated, in-band JSON signing means modifying a JSON object (e.g. removing the HMAC tag) and validating that it’s the same thing that was signed. You do that by computing the HMAC again and validating the result. Unfortunately there are infinitely many equal JSON objects with distinct byte-level representations (for some useful definition of equality, like Python’s builtin ==).

    Some of those differences are trivial, while others are fiendishly complicated. You can add as many spaces as you want between some parts of the grammar, like after the colon and before the value in an object. You can reorder the keys in an object. You can escape a character using a Unicode escape sequence (\u2603) instead of using the UTF-8 representation. “UTF-8” may be a serialization format for Unicode, but it’s not a canonicalization technique. If a character has multiple diacritics, they might occur in different orders. Some characters can be written as a base character plus a diacritic, but there’s also an equivalent single character. You can’t always know what the “right” character out of context: is this the symbol for the unit of resistance (U+2126 OHM SIGN) or a Greek capital letter Omega (U+03A9)? Don’t even get me started on the different ways you can write the same floating point number!

    Three approaches:

    1. Canonicalize the JSON.

    2. Add the tag and the exact string you signed to the object, validate the signature and then validate that the JSON object is the same as the one you got.

    3. Create an alternative format with an easier canonicalization than JSON.

    Canonicalization

    Canonicalization means taking an object and producing a unique representation for it. Two objects that mean the same thing (“are equal”) but are expressed differently canonicalize to the same representation.

    Canonicalization is a quagnet, which is a term of art in vulnerability research meaning quagmire and vulnerability magnet. You can tell it’s bad just by how hard it is to type ‘canonicalization’.

    My favorite canonicalization bug in recent memory is probably Kelby Ludwig’s SAML bug. Hold onto your butts, because this bug broke basically every SAML implementation under the sun in a masterful stroke. It used NameIds (SAML-speak for “the entity this assertion is about”) that look like this:

    <NameId>barney@latacora.com<!---->.evil.com</NameId>
    

    The common canonicalization strategy (“exc-c14n”) will remove comments, so that side sees “barney@latacora.com.evil.com”. The common parsing strategy (“yolo”) disagrees, and sees a text node, a comment, and another text node. Since everyone is expecting a NameId to have one text node, you grab the first one. But that says barney@latacora.com, which isn’t what the IdP signed or your XML-DSIG library validated.

    Not to worry: we said we were doing JSON, and JSON is not XML. It’s simpler! Right? There are at least two specs here: Canonical JSON (from OLPC) and an IETF draft (https://tools.ietf.org/id/draft-rundgren-json-canonicalization-scheme-05.html). They work? Probably? But they’re not fun to implement.

    Include the exact thing you’re signing

    If you interpret the problem as “to validate a signature I need an exact byte representation of what to sign” and canonicalization is just the default mechanism for getting to an exact byte representation, you could also just attach a specific byte serialization to the object with a tag for it.

    You validate the tag matches the specific serialization, and then you validate that the specific serialization matches the outside object with the tag and specific serialization removed. The upside is that you don’t need to worry about canonicalization; the downside is your messages are about twice the size that they need to be. You can maybe make that a little better with compression, since the repeated data is likely to compress well.

    The regex bait and switch trick

    If you interpret the problem as being about already having a perfectly fine serialization to compute a tag over, but the JSON parser/serializer roundtrip screwing it up after you compute the tag, you might try to do something to the serialized format that doesn’t know it’s JSON. This is a variant of the previous approach: you’re just not adding a second serialization to compute the tag over.

    The clever trick here is to add a field of the appropriate size for your tag with a well-known fake value, then HMAC, then swap the value. For example, if you know the tag is HMAC-SHA256, your tag size is 256 bits aka 32 bytes aka 64 hex chars. You add a unique key (something like __hmac_tag) with a value of 64 well-known bytes, e.g. 64 ASCII zero bytes. Serialize the object and compute its HMAC. If you document some subset of JSON serialization (e.g. where CRLFs can occur or where extra spaces can occur), you know that the string "__hmac_tag": “000...” will occur in the serialized byte stream. Now, you can use string replacement to shiv in the real HMAC value. Upon receipt, the decoder finds the tag, reads the HMAC value, replaces it with zeroes, computes the expected tag and compares against the previously read value.

    Because there’s no JSON roundtripping, the parser can’t mess up the JSON object’s specific serialization. The key needs to be unique because of course the string replacement or regular expression doesn’t know how to parse JSON.

    This feels weirdly gross? But at the same time probably less annoying than canonicalization. And it doesn’t work if any of the middleboxes modiy the JSON through a parse/re-serialize cycle.

    An alternative format

    If you interpret the problem as “canonicalization is hard because JSON is more complex than what I really want to sign”, you might think the answer is to reformat the data you want to sign in a format where canonicalization is easy or even automatic. AWS Signatures do this: there’s a serialization format that’s far less flexible than JSON where you put some key parameters, and then you HMAC that. (There’s an interesting part to it where it also incorporates the hash of the exact message you’re signing – but we’ll get to that later.)

    This is particularly attractive if there’s a fixed set of simple values you have to sign, or more generally if the thing you’re signing has a predictable format.

    Request signing in practice

    Let’s apply this model to a case study of request signing has worked through the years in some popular services. These are not examples of how to do it well, but rather cautionary tales.

    First off, AWS. AWS requires you to sign API requests. The current spec is “v4”, which tells you that there is probably at least one interesting version that preceded it.

    AWS Signing v1

    Let’s say an AWS operation CreateWidget takes attribute Name which can be any ASCII string. It also takes an attribute Unsafe, which is false by default and the attacker wishes were true. V1 concatenates the key-value pairs you’re signing, so something like Operation=CreateWidget&Name=iddqd became OperationCreateWidgetNameiddqd. You then signed the resulting string using HMAC.

    The problem with this is if I can get you to sign messages for creating widgets with arbitrary names, I can get you to sign operations for arbitrary CreateWidget requests: I just put all the extra keys and values I want in the value you’re signing for me. For example, the request signature for creating a widget named iddqdUnsafetrue is exactly the same as a request signature for creating a widget named iddqd with Unsafe equal to true: OperationCreateWidgetNameiddqdUnsafetrue.

    AWS Signing V2

    Security-wise: fine.

    Implementation-wise: it’s limited to query-style requests (query parameters for GET, x-www-form-urlencoded for POST bodies) and didn’t support other methods, let alone non-HTTP requests. Sorting request parameters is a burden for big enough requests. Nothing for chunked requests either.

    (Some context: even though most AWS SDKs present you with a uniform interface, there are several different protocol styles in use within AWS. For example, EC2 and S3 are their own thing, some protocols use Query Requests (basically query params in GET queries and POST formencoded bodies), others use REST+JSON, some use REST+XML… There’s even some SOAP! But I think that’s on its way out.)

    AWS Signing V3

    AWS doesn’t seem to like V3 very much. The “what’s new in v4 document” all but disavows it’s existence, and no live services appear to implement it. It had some annoying problems like distinguishing between signed and unsigned headers (leaving the service to figure it out) and devolving to effectively a bearer token when used over TLS (which is great, as long as it actually gets used over TLS).

    Given how AWS scrubbed it away, it’s hard to say anything with confidence. I’ve found implementations, but that’s not good enough: an implementation may only use a portion of the spec while the badness can be hiding in the rest.

    AWS Signing V4

    Security-wise: fine.

    Addressed some problems noted in V2; for example: just signs the raw body bytes and doesn’t care about parameter ordering. This is pretty close to the original recommendation: don’t do inline signing at all, just sign the exact message you’re sending and put a MAC tag on the outside. A traditional objection is that several equivalent requests would have a different representation, e.g. the same arguments but in a different order. It just turns out that in most cases that doesn’t matter, and API auth is one of those cases.

    Also note that all of these schemes are really outside signing, but they’re still interesting because they had a lot of the problems you see on an inline signing scheme (they were just mostly unforced errors).

    AWS Signing V0

    For completeness. It is even harder to find than V3: you have to spelunk some SDKs for it. I hear it might have been HMAC(k, service || operation || timestamp), so it didn’t really sign much of the request.

    Flickr’s API signing

    One commonality of the AWS vulnerabilities is that none of them attacked the primitive. All of them used HMAC and HMAC has always been safe. Flickr had exactly the same bug as AWS V1 signing, but also used a bad MAC. The tag you sent was MD5(secret + your_concatenated_key_value_pairs). We’ll leave the details of extension attacks for a different time, but the punchline is that if you know the value of H(secret + message) and don’t know s, you get to compute H(secret + message + glue + message2), where glue is some binary nonsense and message2 is an arbitrary attacker controlled string.

    A typical protocol where this gets exploited looks somewhat like query parameters. The simplest implementation will just loop over every key-value pair and assign the value into an associative array. So if you have user=lvh&role=user, I might be able to extend that to a valid signature for user=lvh&role=userSOMEBINARYGARBAGE&role=admin.

    Conclusion

    • Just go ahead and always enforce TLS for your APIs.

    • Maybe you don’t need request signing? A bearer token header is fine, or HMAC(k, timestamp) if you’re feeling fancy, or mTLS if you really care.

    • Canonicalization is fiendishly difficult.

    • Add a signature on the outside of the request body, make sure the request body is complete, and don’t worry about “signing what is said versus what is meant” – it’s OK to sign the exact byte sequence.

    • The corollary here is that it’s way harder to do request signing for a REST API (where stuff like headers and paths and methods matter) than it is to do signing for an RPC-like API.

  • The PGP Problem

    Cryptography engineers have been tearing their hair out over PGP’s deficiencies for (literally) decades. When other kinds of engineers get wind of this, they’re shocked. PGP is bad? Why do people keep telling me to use PGP? The answer is that they shouldn’t be telling you that, because PGP is bad and needs to go away.

    There are, as you’re about to see, lots of problems with PGP. Fortunately, if you’re not morbidly curious, there’s a simple meta-problem with it: it was designed in the 1990s, before serious modern cryptography. No competent crypto engineer would design a system that looked like PGP today, nor tolerate most of its defects in any other design. Serious cryptographers have largely given up on PGP and don’t spend much time publishing on it anymore (with a notable exception). Well-understood problems in PGP have gone unaddressed for over a decade because of this.

    Two quick notes: first, we wrote this for engineers, not lawyers and activists. Second: “PGP” can mean a bunch of things, from the OpenPGP standard to its reference implementation in GnuPG. We use the term “PGP” to cover all of these things.

    The Problems

    Absurd Complexity

    For reasons none of us here in the future understand, PGP has a packet-based structure. A PGP message (in a “.asc” file) is an archive of typed packets. There are at least 8 different ways of encoding the length of a packet, depending on whether you’re using “new” or “old” format packets. The “new format” packets have variable-length lengths, like BER (try to write a PGP implementation and you may wish for the sweet release of ASN.1). Packets can have subpackets. There are overlapping variants of some packets. The most recent keyserver attack happened because GnuPG accidentally went quadratic in parsing keys, which also follow this deranged format.

    That’s just the encoding. The actual system doesn’t get simpler. There are keys and subkeys. Key IDs and key servers and key signatures. Sign-only and encrypt-only. Multiple “key rings”. Revocation certificates. Three different compression formats. This is all before we get to smartcard support.

    Swiss Army Knife Design

    If you’re stranded in the woods and, I don’t know, need to repair your jean cuffs, it’s handy if your utility knife has a pair of scissors. But nobody who does serious work uses their multitool scissors regularly.

    A Swiss Army knife does a bunch of things, all of them poorly. PGP does a mediocre job of signing things, a relatively poor job of encrypting them with passwords, and a pretty bad job of encrypting them with public keys. PGP is not an especially good way to securely transfer a file. It’s a clunky way to sign packages. It’s not great at protecting backups. It’s a downright dangerous way to converse in secure messages.

    Back in the MC Hammer era from which PGP originates, “encryption” was its own special thing; there was one tool to send a file, or to back up a directory, and another tool to encrypt and sign a file. Modern cryptography doesn’t work like this; it’s purpose built. Secure messaging wants crypto that is different from secure backups or package signing.

    Mired In Backwards Compatibility

    PGP predates modern cryptography; there are Hanson albums that have aged better. If you’re lucky, your local GnuPG defaults to 2048-bit RSA, the 64-bit-block CAST5 cipher in CFB, and the OpenPGP MDC checksum (about which more later). If you encrypt with a password rather than with a public key, the OpenPGP protocol specifies PGP’s S2K password KDF. These are, to put it gently, not the primitives a cryptography engineer would select for a modern system.

    We’ve learned a lot since Steve Urkel graced the airwaves during ABC’s TGIF: that you should authenticate your ciphertexts (and avoid CFB mode) would be an obvious example, but also that 64-bit block ciphers are bad, that we can do much better than RSA, that mixing compression and encryption is dangerous, and that KDFs should be both time- and memory-hard.

    Whatever the OpenPGP RFCs may say, you’re probably not doing any of these things if you’re using PGP, nor can you predict when you will. Take AEAD ciphers: the Rust-language Sequoia PGP defaulted to the AES-EAX AEAD mode, which is great, and nobody can read those messages because most PGP installs don’t know what EAX mode is, which is not great. Every well-known bad cryptosystem eventually sprouts an RFC extension that supports curves or AEAD, so that its proponents can claim on message boards that they support modern cryptography. RFC’s don’t matter: only the installed base does. We’ve understood authenticated encryption for 2 decades, and PGP is old enough to buy me drinks; enough excuses.

    You can have backwards compatibility with the 1990s or you can have sound cryptography; you can’t have both.

    Obnoxious UX

    We can’t say this any better than Ted Unangst:

    There was a PGP usability study conducted a few years ago where a group of technical people were placed in a room with a computer and asked to set up PGP. Two hours later, they were never seen or heard from again.

    If you’d like empirical data of your own to back this up, here’s an experiment you can run: find an immigration lawyer and talk them through the process of getting Signal working on their phone. You probably don’t suddenly smell burning toast. Now try doing that with PGP.

    Long-Term Secrets

    PGP begs users to keep a practically-forever root key tied to their identity. It does this by making keys annoying to generate and exchange, by encouraging “key signing parties”, and by creating a “web of trust” where keys depend on other keys.

    Long term keys are almost never what you want. If you keep using a key, it eventually gets exposed. You want the blast radius of a compromise to be as small as possible, and, just as importantly, you don’t want users to hesitate even for a moment at the thought of rolling a new key if there’s any concern at all about the safety of their current key.

    The PGP cheering section will immediately reply “that’s why you keep keys on a Yubikey”. To a decent first approximation, nobody in the whole world uses the expensive Yubikeys that do this, and you can’t imagine a future in which that changes (we can barely get U2F rolled out, and those keys are disposable). We can’t accept bad cryptosystems just to make Unix nerds feel better about their toys.

    Broken Authentication

    More on PGP’s archaic primitives: way back in 2000, the OpenPGP working group realized they needed to authenticate ciphertext, and that PGP’s signatures weren’t accomplishing that. So OpenPGP invented the MDC system: PGP messages with MDCs attach a SHA-1 of the plaintext to the plaintext, which is then encrypted (as normal) in CFB mode.

    If you’re wondering how PGP gets away with this when modern systems use relatively complex AEAD modes (why can’t everyone just tack a SHA-1 to their plaintext), you’re not alone. Where to start with this Rube Goldberg contraption? The PGP MDC can be stripped off messages –– it was encoded in such a way that you can simply chop off the last 22 bytes of the ciphertext to do that. To retain backwards compatibility with insecure older messages, PGP introduced a new packet type to signal that the MDC needs to be validated; if you use the wrong type, the MDC doesn’t get checked. Even if you do, the new SEIP packet format is close enough to the insecure SE format that you can potentially trick readers into downgrading; Trevor Perrin worked the SEIP out to 16 whole bits of security.

    And, finally, even if everything goes right, the reference PGP implementation will (wait for it) release unauthenticated plaintext to callers, even if the MDC doesn’t match.

    Incoherent Identity

    PGP is an application. It’s a set of integrations with other applications. It’s a file format. It’s also a social network, and a subculture.

    PGP pushes notion of a cryptographic identity. You generate a key, save it in your keyring, print its fingerprint on your business card, and publish it to a keyserver. You sign other people’s keys. They in turn may or may not rely on your signatures to verify other keys. Some people go out of their way to meet other PGP users in person to exchange keys and more securely attach themselves to this “web of trust”. Other people organize “key signing parties”. The image you’re conjuring in your head of that accurately explains how hard it is to PGP’s devotees to switch to newer stuff.

    None of this identity goop works. Not the key signing web of trust, not the keyservers, not the parties. Ordinary people will trust anything that looks like a PGP key no matter where it came from – how could they not, when even an expert would have a hard time articulating how to evaluate a key? Experts don’t trust keys they haven’t exchanged personally. Everyone else relies on centralized authorities to distribute keys. PGP’s key distribution mechanisms are theater.

    Leaks Metadata

    Forget the email debacle for a second (we’ll get to that later). PGP by itself leaks metadata. Messages are (in normal usage) linked directly to key identifiers, which are, throughout PGP’s cobweb of trust, linked to user identity. Further, a rather large fraction of PGP users make use of keyservers, which can themselves leak to the network the identities of which PGP users are communicating with each other.

    No Forward Secrecy

    A good example of that last problem: secure messaging crypto demands forward secrecy. Forward secrecy means that if you lose your key to an attacker today, they still can’t go back and read yesterday’s messages; they had to be there with the key yesterday to read them. In modern cryptography engineering, we assume our adversary is recording everything, into infinite storage. PGP’s claimed adversaries include world governments, many of whom are certainly doing exactly that. Against serious adversaries and without forward secrecy, breaches are a question of “when”, not “if”.

    To get forward secrecy in practice, you typically keep two secret keys: a short term session key and a longer-term trusted key. The session key is ephemeral (usually the product of a DH exchange) and the trusted key signs it, so that a man-in-the-middle can’t swap their own key in. It’s theoretically possible to achieve a facsimile of forward secrecy using the tools PGP provides. Of course, pretty much nobody does this.

    Clumsy Keys

    An OpenBSD signify(1) public key is a Base64 string short enough to fit in the middle of a sentence in an email; the private key, which isn’t an interchange format, is just a line or so longer. A PGP public key is a whole giant Base64 document; if you’ve used them often, you’re probably already in the habit of attaching them rather than pasting them into messages so they don’t get corrupted. Signify’s key is a state-of-the-art Ed25519 key; PGP’s is a weaker RSA key.

    You might think this stuff doesn’t matter, but it matters a lot; orders of magnitude more people use SSH and manage SSH keys than use PGP. SSH keys are trivial to handle; PGP’s are not.

    Negotiation

    PGP supports ElGamal. PGP supports RSA. PGP supports the NIST P-Curves. PGP supports Brainpool. PGP supports Curve25519. PGP supports SHA-1. PGP supports SHA-2. PGP supports RIPEMD160. PGP supports IDEA. PGP supports 3DES. PGP supports CAST5. PGP supports AES. There is no way this is a complete list of what PGP supports.

    If we’ve learned 3 important things about cryptography design in the last 20 years, at least 2 of them are that negotiation and compatibility are evil. The flaws in cryptosystems tend to appear in the joinery, not the lumber, and expansive crypto compatibility increases the amount of joinery. Modern protocols like TLS 1.3 are jettisoning backwards compatibility with things like RSA, not adding it. New systems support just a single suite of primitives, and a simple version number. If one of those primitives fails, you bump the version and chuck the old protocol all at once.

    If we’re unlucky, and people are still using PGP 20 years from now, PGP will be the only reason any code anywhere includes CAST5. We can’t say this more clearly or often enough: you can have backwards compatibility with the 1990s or you can have sound cryptography; you can’t have both.

    Janky Code

    The de facto standard implementation of PGP is GnuPG. GnuPG is not carefully built. It’s a sprawling C-language codebase with duplicative functionality (write-ups of the most recent SKS key parsing denial of service noted that it has multiple key parsers, for instance) with a long track record of CVEs ranging from memory corruption to cryptographic side channels. It has at times been possible to strip authenticators off messages without GnuPG noticing. It’s been possible to feed it keys that don’t fingerprint properly without it noticing. The 2018 Efail vulnerability was a result of it releasing unauthenticated plaintext to callers. GnuPG is not good.

    GnuPG is also effectively the reference implementation for PGP, and also the basis for most other tools that integrate PGP cryptography. It isn’t going anywhere. To rely on PGP is to rely on GPG.

    The Answers

    One of the rhetorical challenges of persuading people to stop using PGP is that there’s no one thing you can replace it with, nor should there be. What you should use instead depends on what you’re doing.

    Talking To People

    Use Signal. Or Wire, or WhatsApp, or some other Signal-protocol-based secure messenger.

    Modern secure messengers are purpose-built around messaging. They use privacy-preserving authentication handshakes, repudiable messages, cryptographic ratchets that rekey on every message exchange, and, of course, modern encryption primitives. Messengers are trivially easy to use and there’s no fussing over keys and subkeys. If you use Signal, you get even more than that: you get a system so paranoid about keeping private metadata off servers that it tunnels Giphy searches to avoid traffic analysis attacks, and until relatively recently didn’t even support user profiles.

    Encrypting Email

    Don’t.

    Email is insecure. Even with PGP, it’s default-plaintext, which means that even if you do everything right, some totally reasonable person you mail, doing totally reasonable things, will invariably CC the quoted plaintext of your encrypted message to someone else (we don’t know a PGP email user who hasn’t seen this happen). PGP email is forward-insecure. Email metadata, including the subject (which is literally message content), are always plaintext.

    If you needed another reason, read the Efail paper. The GnuPG community, which mishandled the Efail disclosure, talks this research down a lot, but it was accepted at Usenix Security (one of the top academic software security venues) and at Black Hat USA (the top industry software security venue), was one of the best cryptographic attacks of the last 5 years, and is a pretty devastating indictment of the PGP ecosystem. As you’ll see from the paper, S/MIME isn’t better.

    This isn’t going to get fixed. To make actually-secure email, you’d have to tunnel another protocol over email (you’d still be conceding traffic analysis attacks). At that point, why bother pretending?

    Encrypting email is asking for a calamity. Recommending email encryption to at-risk users is malpractice. Anyone who tells you it’s secure to communicate over PGP-encrypted email is putting their weird preferences ahead of your safety.

    Sending Files

    Use Magic Wormhole. Wormhole clients use a one-time password-authenticated key exchange (PAKE) to encrypt files to recipients. It’s easy (for nerds, at least), secure, and fun: we haven’t introduced wormhole to anyone who didn’t start gleefully wormholing things immediately just like we did.

    Someone stick a Windows installer on a Go or Rust implementation of Magic Wormhole right away; it’s too great for everyone not to have.

    If you’re working with lawyers and not with technologists, Signal does a perfectly cromulent job of securing file transfers. Put a Signal number on your security page to receive bug bounty reports, not a PGP key.

    Encrypting Backups

    Use Tarsnap. Colin can tell you all about how Tarsnap is optimized to protect backups. Or really, use any other encrypted backup tool that lots of other people use; they won’t be as good as Tarsnap but they’ll all do a better job than PGP will.

    Need offline backups? Use encrypted disk images; they’re built into modern Windows, Linux, and macOS. Full disk encryption isn’t great, but it works fine for this use case, and it’s easier and safer than PGP.

    Signing Packages

    Use Signify/Minisign. Ted Unangst will tell you all about it. It’s what OpenBSD uses to sign packages. It’s extremely simple and uses modern signing. Minisign, from Frank Denis, the libsodium guy, brings the same design to Windows and macOS; it has bindings for Go, Rust, Python, Javascript, and .NET; it’s even compatible with Signify.

    Encrypting Application Data

    Use libsodium It builds everywhere, has interface that’s designed to be hard to misuse, and you won’t have to shell out to a binary to use it.

    Encrypting Files

    This really is a problem. If you’re/not/making a backup, and you’re /not/archiving something offline for long-term storage, and you’re /not/encrypting in order to securely send the file to someone else, and you’re /not/encrypting virtual drives that you mount/unmount as needed to get work done, then there’s no one good tool that does this now. Filippo Valsorda is working on “age” for these use cases, and I’m super optimistic about it, but it’s not there yet.

    Update, February 2020

    Filippo’s age has been released. It’s a solid design with simple, easily auditable implementations in Go and Rust. You can build binaries for it for every mainstream platform. Age is, of course, much younger than PGP. But I would bet all the money in my pocket against all the money in yours that a new vulnerability will be found in the clangorous contraption of PGP before one is found in age. Look into age!

    Hopefully it’s clear that this is a pretty narrow use case. We work in software security and handle sensitive data, including bug bounty reports (another super common “we need PGP!” use case), and we almost never have to touch PGP.

  • Analyzing a simple encryption scheme using GitHub SSH keys

    (This is an introductory level analysis of a scheme involving RSA. If you’re already comfortable with Bleichenbacher oracles you should skip it.)

    Someone pointed me at the following suggestion on the Internet for encrypting secrets to people based on their GitHub SSH keys. I like the idea of making it easier for people to leverage key material and tools they already have. The encryption instructions are:

    echo "my secret" > message.txt
    curl -q github.com/$USER.keys | head -n 1 > recipient.pub
    ssh-keygen -e -m pkcs8 -f recipient.pub > recipient.pem
    openssl rsautl -encrypt -pubin -inkey recipient.pem -ssl \
        -in message.txt -out encrypted.txt
    

    Anything using an openssl command line tool makes me a little uncomfortable. Let’s poke at it a little.

    We’ll assume that that first key is really an RSA key and we don’t have to worry about EdDSA or ECDSA (or heaven forbid, DSA). You’re encrypting a password for someone. The straightforward threat model is an attacker who has the public key and ciphertext (but no plaintext) and wants to decrypt the ciphertext.

    There are a few ways you can try to attack RSA schemes. You could attack the underlying math: maybe the keys were generated with insufficient entropy (e.g. the Debian weak SSH keys problem) or bogus prime generation (e.g. ROCA). In either case, you can generate the private key from the public key. These keys off of GitHub are likely OpenSSH-generated SSH keys generated on developer laptops and hence unlikely to have that sort of problem. It’s also not specific to this scheme. (A real attacker would still check.)

    Other attacks depend on the type of RSA padding used. The thing that sticks out about that openssl rsautl -encrypt is the -ssl flag. The man page claims:

    -pkcs, -oaep, -ssl, -raw

    the padding to use: PKCS#1 v1.5 (the default), PKCS#1 OAEP, special padding used in SSL v2 backwards compatible handshakes, or no padding, respectively. For signatures, only -pkcs and -raw can be used.

    iA! iA! I forgot that SSLv2 has its own weird padding variant: I remembered it as PKCSv15 from the last time I looked (DROWN). After some source diving (thanks pbsd!) I figured out that backwards-compatible SSLv2 padding is like PKCS1v15, but the first 8 bytes of the random padding are 0x03 (PKCS1v15 code, SSLv2 code). That’s weird, but OK: let’s just say it’s weird PKCSv15 and move on.

    PKCS1v15 and its SSLv2 variant are both vulnerable to Bleichenbacher’s oracle attack. That attack relies on being able to mess with a ciphertext and learn from how it fails decryption via an error message or a timing side channel. That doesn’t work here: this model is “offline”: the attacker gets a ciphertext and a public key, but they don’t get to talk to anything that knows how to decrypt. Hence, they don’t get to try to get it to decrypt maliciously modified ciphertexts either.

    There are lots of ways unpadded (“textbook”) RSA is unsafe, but one of them is that it’s deterministic. If c = me mod N and an attacker is given a c and they can guess a bunch of m, they know which m produced a particular c, and so decrypted the ciphertext. That sounds like a weird model at first, since the attacker comes up with m and just “confirms” it’s the right one. It would work here regardless: passwords are often low-entropy and can be enumerated, that’s the premise of modern password cracking.

    But that’s raw RSA, not PKCS1v15 padding, which is EB = 00 || BT || PS || 00 || D (see RFC), where BT is the block type (here 02 for public key encryption). D is the data you’re encrypting. PS is the “padding string”, which is a little confusing because the entire operation is padding. It’s randomly generated when you’re doing an RSA encryption operation. If you call the maximum size we can run through RSA k (the modulus size), the maximum length for PS is k - D - 3 (one for each null byte and one for the BT byte). The spec insists (and OpenSSL correctly enforces) this to be at least 8 bytes, or 64 bits of entropy. You can do about 50k/s public key operations on my dinky virtualized and heavily power throttled laptop. 64 bits is a bunch but not infinity. That’s still not a very satisfactory result.

    But wait–it’s not PKCS1V15, it’s that weird SSLv2 padding which sets the first 8 bytes of the padding string to 0x03 bytes. If the padding string is just 8 bytes long, that means the padding string is entirely determined. We can verify that by trying to encrypt a message of the appropriate size. For a 2048 bit RSA key, that’s 2048 // 8 == 256 bytes worth of modulus, 3 bytes worth of header bytes and 8 bytes worth of padding, so a 256 - 8 - 3 == 245 byte message. You can go check that any 245 message encrypts to the same ciphertext every time. There’s no lower bound on the amount of entropy in the ciphertext. A 244 byte message will encrypt to one of 256 ciphertexts: one for each possible pseudorandom padding value.

    Practically, is this still fine? Probably, but only within narrow parameters. If the message you’re encrypting is very close to 245 bytes and has plenty of structure known to the attacker, it isn’t. If I can get you to generate a lot of these (say, a CI system automating the same scheme), it won’t be. It’s the kind of crypto that makes me vaguely uncomfortable but you’ll probably get away with because there’s no justice in the world.

    There’s a straightforward way to improve this. Remember how I said -ssl was weird? Not specifying anything would’ve resulted in the better-still-not-great PKCS1v15 padding. If you are going to specify a padding strategy, specify -oaep. OAEP is the good RSA encryption padding. By default, OpenSSL uses SHA1 with it (for both the message digest and in MGF1), which is fine for this purpose. That gives you 160 bits of randomness, which ought to be plenty.

    This is why most schemes use RSA to encrypt a symmetric key.

    Future blog posts: how to fix this, creative scenarios in which we mess with those parameters so it breaks, and how to do the same with ECDSA/EdDSA keys. For the latter: I asked a smart person and the “obvious” thing to them was not the thing I’d have done. For EdDSA, my first choice would be to convert the public key from Ed25519 to X25519 and then use NaCl box. They’d use the EdDSA key directly. So, if you’re interested, we’ll could do a post on that. And then we’d probably talk about why you use NIST P-256 for both signatures and ECDH, but different curve formats in djb country: Ed25519 for signatures and X25519 for ECDH. Oh, and we should do some entropy estimation. People often know how to do that for passwords, but for this threat model we also need to do that for English prose.

  • ROCA vs. ROBOT: An Eternal Golden Braid

    The ROCA RSA key generation flaw or ROBOT, the “Return Of Bleichenbacher” attack: which is most deserving of the “Best Cryptographic Attack” Pwnie award at the 2018 Black Hat USA conference? Only one can survive. Let us consider.

    Assume for the moment that it’s down to those two: ROBOT and ROCA. But first take a moment to consider the best cases for the “runners up”. They are all excellent; it was a very good year for crypto research.

    Efail

    The Efail attack broke PGP email. Also: S/MIME. All encrypted email! That is, by itself, a headlining cryptographic vulnerability. The case for Efail as Pwnie winner:

    • The Efail Pwnie might do the most good for the community of all the Pwnie winners. The encrypted email ecosystem is broken, and has been known to be broken for over a decade. The Efail researchers took vulnerabilities we knew about and weaponized them to break actual email clients. In this sense, Efail looks a lot like BEAST, which weaponized work Bard had done years earlier that nobody had paid attention to. The TLS ecosystem needed a kick in the ass to move away from SSL 3.0, and so too does the messaging community with PGP.

    • Efail is an elegant attack. It’s all exploit work! It’s based on the individuated quirks of a whole ecosystem of clients. It’s the cryptographic equivalent of the work clientside RCE people do memorizing all the offsets in Windows 7’s OLE DLLs.

    So why won’t Efail win? Because cryptographers didn’t take PGP email seriously to begin with.

    Among serious cryptography researchers, Efail was met with a shrug, not because the attack wasn’t important or powerful, but because cryptographers had written off the PGP and S/MIME ecosystems long before — and for all the reasons pointed out in the Efail paper.

    Assume, arguendo, that Efail is out of the running.

    IOTA

    IOTA de-pantsed a custom cryptosystem built for a crypto-as-in-currency. The case for IOTA:

    • It’s hi-larious. It is serious komedy gold. What are the Pwnies if not a key meant to unlock the safe in which our field keeps its joy?

    • It involves a relatively serious cryptographic undertaking, a real understanding of attacks on cryptographic primitives. We don’t get to break crypto primitives all that often! Even the worst cryptosystems in the world tend to use SHA-2 and AES. Not IOTA! They built their own hash function, optimized (somehow, I assume, from the marketing material) for computation in ternary. You’re a crypto pentester, you kind of dream of finding a project dumb enough to make up a new hash function.

    • “Optimized for ternary”. See point one.

    • The IOTA community and its response to the work. See point one. If IOTA wins, there will be jubilation.

    Why not IOTA? Are the Pwnies a serious thing or not? In a very boring year for cryptographic attacks you could make the case for “both”, but not this year. Take IOTA out of the running.

    KRACK

    KRACK breaks WPA2. Everyone uses WPA2. Obviously, KRACK should be a finalist:

    • KRACK leverages a nonce collision, which is practically as fundamental to cryptographic software as memory corruption is to software built in C and C++. A cryptographic researcher at a university might say: “yes, that’s why it shouldn’t win: it’s just another instance of a very well known attack”. Allow me to retort: everybody knows that you can’t copy a 200 byte string into a 100 byte buffer in a C program. We’ve known that since the mid-1990s. Is memory corruption dead? No! Attackers evolved, from noticing “hey, those extra bytes have to go somewhere” to “here is an elaborate sequence of steps involving the sequence of allocations and frees and the way numbers are represented on an X86 processor, that take a program that was coded defensively to avoid a class of attacks and revives that very attack against it”.

    • That’s KRACK. The WPA2 designers knew that you couldn’t simply repeat nonces during handshaking. KRACK figured out a way to trick them into doing that.

    • So I would argue: KRACK is the future of cryptographic vulnerability assessment: the recognition of a fundamental bug class and its application to systems built with that understanding. Every old crypto bug will become new again when someone figures out how to trick a target into reviving it.

    But, as with PGP, cryptography researchers wrote off WPA2 long ago. News flash: they’ve written off WPA3 as well! Good luck with those wireless networks.

    Which brings us to the main event: ROCA or ROBOT?

    Remember what the Pwnie for “Best Cryptographic Attack” represents. It’s “the most impactful cryptographic attack against real-world systems, protocols, or algorithms.” It’s not meant to be theoretical, but rather “requires actual pwnage”.

    In this corner: ROCA

    ROCA broke all the Yubikeys. Also, Estonia. There will be ROCA-vulnerable RSA keys hidden in mission-critical infrastructure systems for the next 20 years. The real-world impact of ROCA is immense.

    The problem with ROCA is that it’s a problem with an exploit that takes core-years to execute. It’s a real vulnerability, but it’s closer to theory than any previous Pwnie nomination.

    And in this corner: ROBOT

    ROBOT broke Facebook, Paypal, Cisco, a bunch of people running F5 middleboxes, Citrix, BouncyCastle, Erlang, WolfSSL, and Unisys ClearPath MCP. ClearPath! Someone finally broke it!

    The problem with ROBOT is that it’s cryptographically less interesting than ROCA. It exploits one of the better-known vulnerabilities in cryptography engineering: Bleichenbacher’s 1998 RSA oracle.

    FIGHT!

    ROCA is complicated. Complicated is good. The Pwnies are a celebration of elegant, high-degree-of-difficulty exploitation. ROCA is that. A lot of cryptography engineers who read the ROCA paper still don’t have their heads around the exploit.

    ROBOT is practical. Practical is good. The Pwnies are about “pwnage”; they’re about things that offensive security people can actually accomplish in the field, against real world systems. ROBOT broke the Unisys ClearPath MCP.

    ROCA is “practical” in a cryptographic sense. As a cryptosystem, the Infineon RSA generator it targets is a smoking crater. But put yourself in the shoes of a red team in 2018. Assume you’ve actually identified a vulnerable key to target. How long will it take you to factor that key? For a 2048 bit key, it’s around “100 CPU-years”.

    But ROCA is so bad that Estonia had to change its name and reissue new identity cards for the new nation of “post-ROCA Estonia”. All the Yubikey 4s got recalled. That’s impact. Impact is good.

    ROCA breaks hardware. Hardware is good. Exploit development against custom hardware is an elite skill. The Pwnies should celebrate elite skill. ROBOT took talent and finesse; the world is not full of Hanno Böcks finding systemic crypto vulnerabilities all across the Internet. But the degree of difficulty on ROCA is higher.

    On the other hand: ROCA affects just one hardware device. The error Infineon apparently made to wind up with the ROCA vulnerability is itself pretty elaborate. The bug was found during a survey of a large group of hardware and software RSA generators; Infineon was the only vendor with this problem. I could go into more detail here but the details are boring. No future vulnerability researcher is going to pull the ROCA paper out of their stack and find an equivalent vulnerability in a new target.

    ROBOT, different story. ROBOT is based on an older vulnerability, but the ROBOT research finally completes the weaponization of that vulnerability — not just in exploiting a single target in a single set of circumstances, but also in detecting it in the first place. In fact, in doing that, they found new ways to tickle the Bleichenbacher vulnerability, uncovering it in systems thought to be secure. The ROBOT methodology probably will get used by smart crypto testers in the future; it contributes to the craft in a broader way than ROCA.

    The crypto nerd in me wants ROCA to win.

    But if I put my “the spirit of the Pwnies” hat on, I’d probably have to give it to ROBOT.

  • The default OpenSSH key encryption is worse than plaintext

    The eslint-scope npm package got compromised recently, stealing npm credentials from your home directory. We started running tabletop exercises: what else would you smash-and-grab, and how can we mitigate that risk?

    Most people have an RSA SSH key laying around. That SSH key has all sorts of privileges: typically logging into prod and GitHub access. Unlike an npm credential, an SSH key is encrypted, so perhaps it’s safe even if it leaks? Let’s find out!

    user@work /tmp $ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/user/.ssh/id_rsa): mykey
    ...
    user@work /tmp $ head -n 5 mykey  
    -----BEGIN RSA PRIVATE KEY-----
    Proc-Type: 4,ENCRYPTED
    DEK-Info: AES-128-CBC,CB973D5520E952B8D5A6B86716C6223F
    
    +5ZVNE65kl8kwZ808e4+Y7Pr8IFstgoArpZJ/bkOs7rB9eAfYrx2CLBqLATk1RT/
    

    You can tell it’s encrypted because it says so right there. It also doesn’t start with MII – the base64 DER clue that an RSA key follows. And AES! That’s good, right? CBC with ostensibly a random IV, even! No MAC, but without something like a padding oracle to try modified ciphertexts on, so that might be OK?

    It’s tricky to find out what this DEK-Info stuff means. Searching the openssh-portable repo for the string DEK-Info only shows sample keys. The punchline is that the AES key is just MD5(password || IV[:8]). That’s not good at all: password storage best practice holds that passwords are bad (low entropy) and in order to turn them into cryptographic key material you need an expensive function like Argon2. MD5 is very cheap to compute. The only thing this design has going for it is that the salt goes after the password, so you can’t just compute the intermediate state of MD5(IV[8:]) and try passwords from there. That’s faint praise, especially in a world where I can rent a machine that tries billions of MD5 calls per second. There just aren’t that many passwords.

    You might ask yourself how OpenSSH ended up with this. The sad answer is the OpenSSL command line tool had it as a default, and now we’re stuck with it.

    That’s a fair argument to say that standard password-encrypted keys are about as good as plaintext: the encryption is ineffective. But I made a stronger statement: it’s worse. The argument there is simple: an SSH key password is unlikely to be managed by a password manager: instead it’s something you remember. If you remember it, you probably reused it somewhere. Perhaps it’s even your device password. This leaked key provides an oracle: if I guess the password correctly (and that’s feasible because the KDF is bad), I know I guessed correctly because I can check against your public key.

    There’s nothing wrong with the RSA key pair itself: it’s just the symmetric encryption of the private key. You can’t mount this attack from just a public key.

    How do you fix this? OpenSSH has a new key format that you should use. “New” means 2013. This format uses bcrypt_pbkdf, which is essentially bcrypt with fixed difficulty, operated in a PBKDF2 construction. Conveniently, you always get the new format when generating Ed25519 keys, because the old SSH key format doesn’t support newer key types. That’s a weird argument: you don’t really need your key format to define how Ed25519 serialization works since Ed25519 itself already defines how serialization works. But if that’s how we get good KDFs, that’s not the pedantic hill I want to die on. Hence, one answer is ssh-keygen -t ed25519. If, for compatibility reasons, you need to stick to RSA, you can use ssh-keygen -o. That will produce the new format, even for old key types. You can upgrade existing keys with ssh-keygen -p -o -f PRIVATEKEY. If your keys live on a Yubikey or a smart card, you don’t have this problem either.

    We want to provide a better answer to this. On the one hand, aws-vault has shown the way by moving credentials off disk and into keychains. Another parallel approach is to move development into partitioned environments. Finally, most startups should consider not having long-held SSH keys, instead using temporary credentials issued by an SSH CA, ideally gated on SSO. Unfortunately this doesn’t work for GitHub.

    PS: It’s hard to find an authoritative source, but from my memory: the versioned parameter in the PEM-like OpenSSH private key format only affect the encryption method. That doesn’t matter in the slightest: it’s the KDF that’s broken. That’s an argument against piecemeal negotiating parts of protocols, I’m sure. We’ll get you a blog post on that later.

    The full key is available here, just in case you feel like running john the ripper on something today: gist.github.com/lvh/c532c…

  • Factoring the Noise protocol matrix

    TL;DR: if I ever told you to use Noise, I probably meant Noise_IK and should have been more specific.

    The Noise protocol is one of the best things to happen to encrypted protocol design. WireGuard inherits its elegance from Noise. Noise is a cryptography engineer’s darling spec. It’s important not to get blindsided while fawning over it and to pay attention to where implementers run into trouble. Someone raised a concern I had run into before: Noise has a matrix.

    N(rs):
     ← s
     ...
     → e, es

    NN:
     → e
     ← e, ee

    KN:
    → s

      ...
    → e
    ← e, ee, se

    XN:
     → e
     ← e, ee
     → s, se

    IN:
      → e, s
     ← e, ee, se

    K(s, rs):
     → s
     ← s
     ...
     → e, es, ss

    NK:
     ← s
     ...
     → e, es
     ← e, ee

    KK:
      → s
      ← s
      …
      → e, es, ss
      ← e, ee, se

    XK:
     ← s
     ...
     → e, es
     ← e, ee
     → s, se

    IK:
      ← s
     ...
     → e, es, s, ss
     ← e, ee, se

    X(s, rs):
     ← s
     ...
     → e, es, s, ss

    NX:
     → e
     ← e, ee, s, es

    KX:
     → s
     ...
     → e
     ← e, ee, se, s, es

    XX:
     → e
     ← e, ee, s, es
     → s, se

     IX:
      → e, s
      ← e, ee, se, s, es

    To a cryptography engineer, this matrix is beautiful. These eldritch runes describe a grammar: the number of ways you can meaningfully compose the phrases that can make up a Noise handshake into a proper protocol. The rest of the document describes what the trade-offs between them are: whether the protocol is one-way or interactive, whether you get resistance against key-compromise impersonation, what sort of privacy guarantees you get, et cetera.

    (Key-compromise impersonation means that if I steal your key, I can impersonate anyone to you.)

    To the layperson implementer, the matrix is terrifying. They hadn’t thought about key-compromise impersonation or the distinction between known-key, hidden-key and exposed-key protocols or even forward secrecy. They’re going to fall back to something else: something probably less secure but at least unambiguous on what to do.

    As Noise matures into a repository for protocol templates with wider requirements, this gets worse, not better. The most recent revision of the Noise protocol adds 23 new “deferred” variants. It’s unlikely these will be the last additions.

    Which Noise variant should they use? Depends on the application of course, but we can make some reasonable assumptions for most apps. Ignoring variants, we have:

    N

    NN

    KN

    XN

    IN

    K

    NK

    KK

    XK

    IK

    X

    NX

    KX

    XX

    IX

    Firstly, let’s assume you need bidirectional communication, meaning initiator and responder can send messages to each other as opposed to just initiator to responder. That gets rid of the first column of the matrix.

    N

    NN

    KN

    XN

    IN

    K

    NK

    KK

    XK

    IK

    X

    NX

    KX

    XX

    IX

    The other protocols are defined by two letters. From the spec:


    The first character refers to the initiator\’s static key:

    • N = No static key for initiator
    • K = Static key for initiator Known to responder
    • X = Static key for initiator Xmitted (\“transmitted\“) to responder
    • I = Static key for initiator Immediately transmitted to responder, despite reduced or absent identity hiding

    The second character refers to the responder\’s static key:

    • N = No static key for responder
    • K = Static key for responder Known to initiator
    • X = Static key for responder Xmitted (\“transmitted\“) to initiator

    NN provides confidentiality against a passive attacker but neither party has any idea who you’re talking to because no static (long-term) keys are involved. For most applications none of the *N suites make a ton of sense: they imply the initiator does not care who they’re connecting to.

    N

    NN

    KN

    XN

    IN

    K

    NK

    KK

    XK

    IK

    X

    NX

    KX

    XX

    IX

    For most applications the client (initiator) ought to have a fixed static key so we have a convenient cryptographic identity for clients over time. So really, if you wanted something with an N in it, you’d know.

    N

    NN

    KN

    XN

    IN

    K

    NK

    KK

    XK

    IK

    X

    NX

    KX

    XX

    IX

    The responder usually doesn’t know what the key is for any initiator that happens to show up. This mostly makes sense if you have one central initiator that reaches out to a lot of responders: something like an MDM or sensor data collection perhaps. In practice, you often end up doing egress from those devices anyway for reasons that have nothing to do with Noise. So, K* is out.

    N

    NN

    KN

    XN

    IN

    K

    NK

    KK

    XK

    IK

    X

    NX

    KX

    XX

    IX

    These remaining suites generally trade privacy (how easily can you identify participants) for latency (how many round trips are needed).

    IX doesn’t provide privacy for the initiator at all, but that’s the side you usually care about. It still has the roundtrip downside, making it a niche variant.

    XX and XK require an extra round trip before they send over the initiator’s static key. Flip side: they have the strongest possible privacy protection for the initiator, whose identity is only sent to the responder after they’ve been authenticated and forward secrecy has been established.

    IK provides a reasonable tradeoff: no extra round trip and the initiator’s key is encrypted to the responder’s static key. That means that the initiator’s key is only disclosed if the responder’s key is compromised. You probably don’t care about that. It does require the initiator to know the static key of the responder ahead of time but that’s probably true anyway: you want to check that key against a trusted value. You can also try private keys for the responder offline but that doesn’t matter unless you gratuitously messed up key generation. In conclusion, you probably want IK.

    This breakdown only works if you’re writing a client-server application that plausibly might’ve used mTLS instead. WireGuard, for example, is built on Noise_IK. The other variants aren’t pointless: they’re just good at different things. If you care more about protecting your initiator’s privacy than you do about handshake latency, you want Noise_XK. If you’re doing a peer-to-peer IoT system where device privacy matters, you might end up with Noise_XX. (It’s no accident that IK, XK and XX are in the last set of protocols standing.)

    Protocol variants Ignore deferred variants for now. If you needed them you’d know. PSK is an interesting quantum computer hedge. We’ll talk more about quantum key exchanges in a different post, but briefly: a shared PSK among several participants protects against a passive adversary that records everything and acquires a quantum computer some time in the future, while retaining the convenient key distribution of public keys.

    Conclusion It’s incredible how much has happened in the last few years to make protocols safer, between secure protocol templates like Noise, new proof systems like Tamarin, and ubiquitous libraries of safer primitives like libsodium. So far, the right answer for a safe transport has almost always been TLS, perhaps mutually authenticated. That’s not going to change right away, but if you control both sides of the network and you need properties hard to get out of TLS, Noise is definitely The Right Answer. Just don’t stare at the eldritch rune matrix too long. You probably want Noise_IK. Or, you know, ask your security person :)

    Thanks to Katriel Cohn-Gordon for reviewing this blog post.

  • Loud subshells

    Default shells usually end in $. Unless you’re root and it’s #. That tradition has been around forever: people recognized the need to highlight you’re not just some random shmoe.

    These days we have lots of snazzy shell magic. You might still su, but you’re more likely to sudo. We still temporarily assume extra privileges. If you have access to more than one set of systems, like production and staging, you probably have ways of putting on a particular hat. Some combination of setting an environment variable, adding a key to ssh-agent, or assuming aws AWS role with aws-vault. You know, so you don’t accidentally blow away prod.

    If a privilege is important enough not to have around all the time, it’s important enough to be reminded you have it. You’re likely to have more than one terminal open. You might want to be reminded when your extra privileges are about to expire. That might be something you just set up for your own environment. But as your organization grows, you’ll want to share that with others. If you develop software, you might want to make it easy for your users to get a similarly loud shell.

    Major shells you might care about: POSIX sh, bash, zsh, and fish. POSIX sh is a spec, not an implementation. POSIX sh compatibility means it’ll probably work in bash and zsh too. You might run into dash or busybox in a tiny Docker container image. Both are POSIXy. There’s large overlap between all of them but fish, which is different and proud of it.

    There are lots of ways to configure a shell prompt but PS1 is the default. Just PS1=“xyzzy>” $SHELL doesn’t work. You get a new shell, but it will execute a plethora of configuration files. One of them will set PS1 indiscriminately and so your fancy prompt gets clobbered.

    So what do you do? Major techniques:

    • An rc-less shell.
    • A dedicated environment variable that your shell’s PS1 knows about.
    • Sourcing scripts
    • Crazy hacks like PROMPT_COMMAND or precmd or reconstituted rcfiles

    rcless shells If you interpret the problem as shells having configuration, you can disable that. Unfortunately, the flags for this are different across shells:

    • bash uses the –norc flag
    • zsh uses the –no-rcs flag
    • dash doesn’t have a flag but instead reads from ENV

    You can’t necessarily count on any particular shell being available. The good news is you don’t have to care about fish here: it’s uncommon and you’ve already committed to giving people a limited shell. So, either count on everyone to have bash or write something like this:

    SHFLAGS=""
    case "$SHELL" in
        *zsh*) SHFLAGS="--no-rcs" ;;
        *bash*) SHFLAGS="--norc" ;;
    esac
    

    Eventually, you run $SHELL with PS1 set and you’re done. The good news is that rcless shells will work pretty much anywhere. The bad news is that the shell customization people are used to is gone: aliases, paths, colors, shell history et cetera. If people don’t love the shell you give them, odds are they’re going to look for a workaround.

    Dedicated environment variables If you interpret the problem as setting PS1 being fundamentally wrong because that’s the shell configuration’s job, you could argue that the shell configuration should also just set the prompt correctly. Your job is not to set the prompt, but to give the shell everything it needs to set it for you.

    As an example, aws-vault already conveniently sets an environment variable for you, so you can do this in your zshrc:

    if [ -n "${AWS_VAULT}" ] ; then
      echo -e "$(tput setab 1)In aws-vault env ${AWS_VAULT}$(tput sgr0)"
      export PS1="$(tput setab 1)<<${AWS_VAULT}>>$(tput sgr0) ${PS1}";
    fi;
    

    Now, just use aws-vault’s exec to run a new shell:

    aws-vault exec work -- $SHELL
    

    … and you’ll get a bright red warning telling you that you have elevated privileges.

    I use tput here and you should too. Under the hood, it’ll produce regular old ANSI escape codes. $(tput setab 1) sets the background to color 1 (red). $(tput sgr0) resets to defaults. Three reasons you should use tput instead of manually entering escape codes:

    1. It’s more legible.
    2. It’s more portable across shells. If you did PS1=“${REDBG}shouty${NORMAL}” it’ll work fine in bash but zsh will escape the escape codes and your prompt will have literal slashes and brackets in it. Unless you put a dollar sign in front of the double quote, which bash doesn’t like.
    3. It’s more portable across terminals.

    The downside to this is that it fails open. If nobody changes PS1 you don’t get a fancy prompt. It’s a pain to enforce this via MDM.

    source If you reinterpret the problem as trying to create a subshell at all, you could try to modify the shell you’re in. You can do that simply by calling source somefile. This is how Python’s virtualenv works.

    The upside is that it’s pretty clean, the downside is that it’s pretty custom per shell. If you’re careful, odds are you can write it in POSIX sh and cover sh, bash, zsh in one go, though. Unless you need to support fish. Or eshell or whatever it is that one person on your team uses.

    PROMPT_COMMAND Let’s say you have no shame and you interpret the problem as shells refusing to bend to your eternal will. You can get past the above restriction with the rarely used PROMPT_COMMAND environment variable. Quoth the docs:

    Bash provides an environment variable called PROMPT_COMMAND. The contents of this variable are executed as a regular Bash command just before Bash displays a prompt.

    Because nobody sets PROMPT_COMMAND you can set it in the environment of a new shell process and it won’t get clobbered like PS1 would. Because it’s bash source, it can do whatever it wants, including setting environment variables like PS1 and un-setting environment variables like PROMPT_COMMAND itself. You know, something like:

    PROMPT_COMMAND='PS1="${PS1} butwhy>";unset PROMPT_COMMAND'
    

    Of course, you have now forfeited any pretense of being reasonable. This doesn’t play well with others. Another downside is that this doesn’t work in zsh. That has an equivalent precmd() function but won’t grab it from an environment variable. Which brings us to our final trick:

    Reconstituted rcfiles If you absolutely must, you could technically just cobble together the entire rcfile, including the parts that the shell would ordinarily source itself. I don’t really want to help you do that, but it’d probably look something like this:

    $ zsh --rcs <(echo "echo why are you like this")
    why are you like this
    

    Please don’t do that. Depending on context, use an rc-less shell or a dedicated environment variable or source something. Please?

  • A Child’s Garden of Inter-Service Authentication Schemes

    Modern applications tend to be composed from relationships between smaller applications. Secure modern applications thus need a way to express and enforce security policies that span multiple services. This is the “server-to-server” (S2S) authentication and authorization problem (for simplicity, I’ll mash both concepts into the term “auth” for most of this post).

    Designers today have a lot of options for S2S auth, but there isn’t much clarity about what the options are or why you’d select any of them. Bad decisions sometimes result. What follows is a stab at clearing the question up.

    Cast Of Characters

    Alice and Bob are services on a production VPC. Alice wants to make a request of Bob. How can we design a system that allows this to happen?

    Here’s, I think, a pretty comprehensive overview of available S2S schemes. I’ve done my best to describe the “what’s” and minimize the “why’s”, beyond just explaining the motivation for each scheme. Importantly, these are all things that reasonable teams use for S2S auth.

    Nothing At All

    Far and away the most popular S2S scheme is “no auth at all”. Internet users can’t reach internal services. There’s little perceived need to protect a service whose only clients are already trusted.

    Bearer Token

    Bearer tokens rule everything around us. Give Alice a small blob of data, such that when Bob sees that data presented, he assumes he’s talking to Alice. Cookies are bearer tokens. Most API keys are bearer tokens. OAuth is an elaborate scheme for generating and relaying bearer tokens. SAML assertions are delivered in bearer tokens.

    The canonical bearer token is a random string, generated from a secure RNG, that is at least 16 bytes long (that is: we generally consider 128 bits a reasonable common security denominator). But part of the point of a bearer token is that the holder doesn’t care what it is, so Alice’s bearer token could also encode data that Bob could recover. This is common in client-server designs and less common in S2S designs.

    A few words about passwords

    S2S passwords are disappointingly common. You see them in a lot of over-the-Internet APIs (ie, for S2S relationships that span companies). A password is basically a bearer token that you can memorize and quickly type. Computers are, in 2018, actually pretty good at memorizing and typing, and so you should use real secrets, rather than passwords, in S2S applications.

    HMAC(timestamp)

    The problem with bearer tokens is that anybody who has them can use them. And they’re routinely transmitted. They could get captured off the wire, or logged by a proxy. This keeps smart ops people up at night, and motivates a lot of “innovation”.

    You can keep the simplicity of bearer tokens while avoiding the capture-in-flight problem by exchanging the tokens with secrets, and using the secrets to authenticate a timestamp. A valid HMAC proves ownership of the shared secret without revealing it. You’d then proceed as with bearer tokens.

    A few words about TOTP

    TOTP is basically HMAC(timestamp) stripped down to make it easy for humans to briefly memorize and type. As with passwords, you shouldn’t see TOTP in S2S applications.

    A few words about PAKEs

    PAKEs are a sort of inexplicably popular cryptographic construction for securely proving knowledge of a password and, from that proof, deriving an ephemeral shared secret. SRP is a PAKE. People go out of their way to find applications for PAKEs. The thing to understand about them is that they’re fundamentally a way to extract cryptographic strength from passwords. Since this isn’t a problem computers have, PAKEs don’t make sense for S2S auth.

    Encrypted Tokens

    HMAC(timestamp) is stateful; it works because there’s pairwise knowledge of secrets and the metadata associated with them. Usually, this is fine. But sometimes it’s hard to get all the parties to share metadata.

    Instead of making that metadata implicit to the protocol, you can store it directly in the credential: include it alongside the timestamp and HMAC or encrypt it. This is how Rails cookie storage works; it’s also the dominant use case for JWTs. AWS-style request “signing” is another example (using HMAC and forgoing encryption).

    By themselves, encrypted tokens make more sense in client-server settings than they do for S2S. Unlike client-server, where a server can just use the same secret for all the clients, S2S tokens still require some kind of pairwise state-keeping.

    Macaroons

    You can’t easily design a system where Alice takes her encrypted token, reduces its security scope (for instance, from read-write to read-only), and then passes it to Dave to use on her behalf. No matter how “sophisticated” we make the encoding and transmission mechanisms, encrypted tokens still basically express bearer logic.

    Macaroons are an interesting (and criminally underused) construction that directly provides both delegation and attenuation. They’re a kind of token from which you can derive more restricted tokens (that’s the “attenuation”), and, if you want, pass that token to someone else to use without them being able to exceed the authorization you gave them. Macaroons accomplish this by chaining HMAC; the HMAC of a macaroon is the HMAC secret for its derived attenuated macaroons.

    By adding encryption along with HMAC, Macaroons also express “third-party” conditions. Alice can get Charles to attest that Alice is a member of the super-awesome-best-friends-club, and include that in the Macaroon she delivers to Bob. If Bob also trusts Charles, Bob can safely learn whether Alice is in the club. Macaroons can flexibly express whole trees of these kinds of relationships, capturing identity, revocation, and… actually, revocation and identity are the only two big wins I can think of for this feature.

    Asymmetric Tokens

    You can swap the symmetric constructions used in tokens for asymmetric tokens and get some additional properties.

    Using signatures instead of HMACs, you get non-repudiability: Bob can verify Alice’s token, but can’t necessarily mint a new Alice token himself.

    More importantly, you can eliminate pairwise configuration. Bob and Alice can trust Charles, who doesn’t even need to be online all the time, and from that trust derive mutual authentication.

    The trade-offs for these capabilities are speed and complexity. Asymmetric cryptography is much slower and much more error-prone than symmetric cryptography.

    Mutual TLS

    Rather than designing a new asymmetric token format, every service can have a certificate. When Alice connects to Bob, Bob can check a whitelist of valid certificate fingerprints, and whether Alice’s name on her client certificate is allowed. Or, you could set up a simple CA, and Bob could trust any certificate signed by the CA. Things can get more complex; you might take advantage of X.509 and directly encode claims in certs (beyond just names).

    A few words about SPIFFE

    If you’re a Kubernetes person this scheme is also sometimes called SPIFFE.

    A few words about Tokbind

    If you’re a participant in the IETF TLS Working Group, you can combine bearer tokens and MTLS using tokbind. Think of tokbind as a sort of “TLS cookie”. It’s derived from the client and server certificate and survives multiple TLS connections. You can use a tokbind secret to sign a bearer token, resulting in a bearer token that is confined to a particular MTLS relationship that can’t be used in any other context.

    Magic Headers

    Instead of building an explicit application-layer S2S scheme, you can punt the problem to your infrastructure. Ensure all requests are routed through one or more trusted, stateful proxies. Have the proxies set headers on the forwarded requests. Have the services trust the headers.

    This accomplishes the same things a complicated Mutual TLS scheme does without requiring slow, error-prone public-key encryption. The trade-off is that your policy is directly coupled to your network infrastructure.

    Kerberos

    You can try to get the benefits of magic headers and encrypted tokens at the same time using something like Kerberos, where there’s a magic server trusted by all parties, but bound by cryptography rather than network configuration. Services need to be introduced to the Kerberos server, but not to each other; mutual trust of the Kerberos server, and authorization logic that lives on that Kerberos server, resolves all auth questions. Notably, no asymmetric cryptography is needed to make this work.

    Themes

    What are the things we might want to achieve from an S2S scheme? Here’s a list. It’s incomplete. Understand that it’s probably not reasonable to expect all of these things from a single scheme.

    Minimalism

    This goal is less obvious than it seems. People adopt complicated auth schemes without clear rationales. It’s easy to lose security by doing this; every feature you add to an application – especially security features – adds attack surface. From an application security perspective, “do the simplest thing you can get away with” has a lot of merit. If you understand and keep careful track of your threat model, “nothing at all” can be a security-maximizing option. Certainly, minimalism motivates a lot of bearer token deployments.

    The opposite of minimalism is complexity. A reasonable way to think about the tradeoffs in S2S design is to think of complexity as a currency you have to spend. If you introduce new complexity, what are you getting for it?

    Claims

    Authentication and authorization are two different things: who are you, and what are you allowed to do? Of the two problems, authorization is the harder one. An auth scheme can handle authorization, or assist authorization, or punt on it altogether.

    Opaque bearer token schemes usually just convey identity. An encrypted token, on the other hand, might bind claims: statements that limit the scope of what the token enables, or metadata about the identity of the requestor.

    Schemes that don’t bind claims can make sense if authorization logic between services is straightforward, or if there’s already a trusted system (for instance, a service discovery layer) that expresses authorization. Schemes that do bind claims can be problematic if the claims carried in an credential can be abused, or targeted by application flaws. On the other hand, an S2S scheme that supports claims can do useful things like propagating on-behalf-of requestor identities or supporting distributed tracing.

    Confinement

    The big problem with HTTP cookies is that once they’ve captured one, an attacker can abuse it however they see fit. You can do better than that by adding mitigations or caveats to credentials. They might be valid only for a short period of time, or valid only for a specific IP address (especially powerful when combined with short expiry), or, as in the case of Tokbind, valid only on a particular MTLS relationship.

    Statelessness

    Statelessness means Bob doesn’t have to remember much (or, ideally, anything) about Alice. This is an immensely popular motivator for some S2S schemes. It’s perceived as eliminating a potential performance bottleneck, and as simplifying deployment.

    The tricky thing about statelessness is that it often doesn’t make sense to minimize state, only to eliminate it. If pairwise statefulness creeps back into the application for some other reason (for instance, Bob has to remember anything at all about Alice), stateless S2S auth can spend a lot of complexity for no real gain.

    Pairwise Configuration

    Pairwise configuration is the bête noire of S2S operational requirements. An application secret that has to be generated once for each of several peers and that anybody might ever store in code is part of a scheme in which secrets are never, ever rotated. In a relatively common set of circumstances, pairwise config means that new services can only be introduced during maintenance windows.

    Still, if you have a relatively small and stable set of services (or if all instances of a particular service might simply share a credential), it can make sense to move complexity out of the application design and into the operational requirements. Also it makes sense if you have an ops team and you never have to drink with them.

    I kid, really, because if you can get away with it, not spending complexity to eliminate pairwise configuration can make sense. Also, many of the ways S2S schemes manage to eliminate pairwise configurations involve introducing yet another service, which has a sort of constant factor cost that can swamp the variable cost.

    Delegation and Attenuation

    People deploy a lot of pointless delegation. Application providers might use OAuth for their client-server login, for instance, even though no third-party applications exist. The flip side of this is that if you actually need delegation, you really want to have it expressed carefully in your protocol. The thing you don’t want to do is ever share a bearer token.

    Delegation can show up in internal S2S designs as a building block. For instance, a Macaroon design might have a central identity issuance server that grants all-powerful tokens to systems that in turn filter them for specific requestors.

    Some delegation schemes have implied or out-of-band attenuation. For instance, you might not be able to look at an OAuth token and know what it’s restrictions are. These systems are rough in practice; from an operational security perspective, your starting point probably needs to be that any lost token is game-over for its owner.

    A problem with writing about attenuation is that Macaroons express it so well that it’s hard to write about its value without lapsing into the case for Macaroons.

    Flexibility

    If use JSON as your credential format, and you later build a feature that allows a credential to express not just Alice’s name but also whether she’s an admin, you can add that feature without changing the credential format. Later, attackers can add the feature where they turn any user into an admin, and you can then add the feature that breaks that attack. JSON is just features all the way down.

    I’m only mostly serious. If you’re doing something more complicated than a bearer token, you’re going to choose an extensible mechanism. If not, I already made the case for minimalism.

    Coupling

    All things being equal, coupling is bad. If your S2S scheme is expressed by network controls and unprotected headers, it’s tightly coupled to the network deployment, which can’t change without updating the security scheme. But if your network configuration doesn’t change often, that limitation might save you a lot of complexity.

    Revocation

    People talk about this problem a lot. Stateless schemes have revocation problems: the whole point of a stateless scheme is for Bob not to have to remember anything about Alice (other than perhaps some configuration that says Alice is allowed to make requests, but not Dave, and this gets complicated really quickly and can quickly call into question the value of statelessness but let’s not go there). At any rate: a stateless bearer token will eventually be compromised, and you can’t just let it get used over and over again to steal data.

    The two mainstream answers to this problem are short expiry and revocation lists.

    Short expiry addresses revocation if: (a) you have a dedicated auth server and the channel to that server is somehow more secure than the channel between Alice and Bob.; (b) the auth server relies on a long-lived secret that never appears on the less-secure channel, and © issues an access secret that is transmitted on the less-secure channel, but lives only for a few minutes. These schemes are called “refresh tokens”. Refresh tends to find its way into a lot of designs where this fact pattern doesn’t hold. Security design is full of wooden headphones and coconut phones.

    Revocation lists (and, usually, some attendant revocation service) are a sort of all-purpose solution to this problem; you just blacklist revoked tokens, for at least as long as the lifetime of the token. This obviously introduces state, but it’s a specific kind of state that doesn’t (you hope) grow as quickly as your service does. If it’s the only state you have to keep, it’s nice to have the flexibility of putting it wherever you want.

    Rigidity

    It is hard to screw up a random bearer token. Alice stores the token and supply it on requests. Bob uses the token to look up an entry in a database. There aren’t a lot of questions.

    It is extraordinarily easy to screw up JWT. JWT is a JSON format where you have to parse and interpret a JSON document to figure out how to decrypt and authenticate a JSON document. It has revived bugs we thought long dead, like “repurposing asymmetric public keys as symmetric private keys”.

    Problems with rigidity creep up a lot in distributed security. The first draft of this post said that MTLS was rigid; you’re either speaking TLS with a client cert or you’re not. But that ignores how hard X.509 validation is. If you’re not careful, an attacker can just ask Comodo for a free email certificate and use it to access your services. Worse still, MTLS can “fail open” in a way that TLS sort of doesn’t: if a service forgets to check for client certificates, TLS will still get negotiated, and you might not notice until an attacker does.

    Long story short: bearer tokens are rigid. JWT is a kind of evil pudding. Don’t use JWT.

    Universality

    A nice attribute of widely deployed MTLS is that it can mitigate SSRF bugs (the very bad bug where an attacker coerces one of your service to make an arbitrary HTTP request, probably targeting your internal services, on their behalf). If the normal HTTP-request-generating code doesn’t add a client certificate, and every internal service needs to see one to honor a request, you’ve limited the SSRF attackers options a lot.

    On the other hand, we forget that a lot of our internal services consist of code that we didn’t write. The best example of this is Redis, which for years proudly waved the banner of “if you can talk to it, you already own the whole application”.

    It’s helpful if we can reasonably expect an auth control to span all the systems we use, from Postgres to our custom revocation server. That might be a realistic goal with Kerberos, or with network controls and magic headers; with tunnels or proxies, it’s even something you can do with MTLS – this is a reason MTLS is such a big deal for Kubernetes, where it’s reasonable for the infrastructure to provide every container with an MTLS-enabled Envoy proxy. On the other hand it’s unlikely to be something you can achieve with Macaroons or evil puddings.

    Performance and Complexity

    If you want performance and simplicity, you probably avoid asymmetric crypto, unless your request frequency is (and will remain) quite low. Similarly, you’d probably want to avoid dedicated auth servers, especially if Bob needs to be in constant contact with them for Alice to make requests to him; this is a reason people tend to migrate away from Kerberos.

    Our Thoughts

    Do the simplest thing that makes sense for your application right now. A true fact we can relate from something like a decade of consulting work on these problems: intricate S2S auth schemes are not the norm; if there’s a norm, it’s “nothing at all except for ELBs”. If you need something, but you have to ask whether that something oughtn’t just be bearer tokens, then just use bearer tokens.

    Unfortunately, if there’s a second norm, it’s adopting complicated auth mechanisms independently or, worse, in combination, and then succumbing to vulnerabilities.

    Macaroons are inexplicably underused. They’re the Velvet Underground of authentication mechanisms, hugely influential but with little radio airplay. Unlike the Velvets, Macaroons aren’t overrated. They work well for client-server auth and for s2s auth. They’re very flexible but have reassuring format rigidity, and they elegantly take advantage of just a couple simple crypto operations. There are libraries for all the mainstream languages. You will have a hard time coming up with a scenario where we’d try to talk you out of using them.

    JWT is a standard that tries to do too much and ends up doing everything haphazardly. Our loathing of JWT motivated this post, but this post isn’t about JWT; we’ll write more about it in the future.

    If your inter-service auth problem really decomposes to inter-container (or, without containers, inter-instance) auth, MTLS starts to make sense. The container-container MTLS story usually involves containers including a proxy, like Envoy, that mediates access. If you’re not connecting containers, or have ad-hoc components, MTLS can really start to take on a CORBA feel: random sidecar processes (here stunnel, there Envoy, and this one app that tries to do everything itself). It can be a pain to configure properly, and this is a place you need to get configurations right.

    If you can do MTLS in such a way that there is exactly one way all your applications use it (probably: a single proxy that all your applications install), consider MTLS. Otherwise, be cautious about it.

    Beyond that, we don’t want to be too much more prescriptive. Rather, we’d just urge you to think about what you’re actually getting from an S2S auth scheme before adopting it.

    (But really, you should just use Macaroons.)

  • Gripes with Google Groups

    If you’re like me, you think of Google Groups as the Usenet client turned mailing list manager. If you’re a GCP user or maybe one of a handful of SAML users you probably know Google Groups as an access control mechanism. The bad news is we’re both right.

    This can blow up if permissions on those groups aren’t set right. Your groups were probably originally created by a sleep-deprived founder way before anyone was worried about access control. It’s been lovingly handcrafted and never audited ever since. Let’s say their configuration is, uh, “inconsistent”. If an administrator adds people to the right groups as part of their on-boarding, it’s not obvious when group membership is secretly self-service. Even if someone can’t join a group, they might still be able to read it.

    You don’t even need something using group membership as access control for this to go south. The simplest way is a password reset email. (Having a list of all of your vendors feels like a dorky compliance requirement, but it’s underrated. Being able to audit which ones have multi-factor authentication is awesome.)

    Some example scenarios:

    Scenario 1 You get your first few customers and start seeing fraud. You create a mailing list with the few folks who want to talk about that topic. Nobody imagined that dinky mailing list would grow out to a full-fledged team, let alone one with permissions to a third party analytics suite that has access to all your raw data.

    Scenario 2 Engineering team treats their mailing list as open access for the entire company. Ops deals with ongoing incidents candidly and has had bad experiences with nosy managers looking for scapegoats. That’s great until someone in ops extends an access control check in some custom software that gates on ops@ to also include engineering@.

    Scenario 3 board@ gets a new investor who insists on using their existing email address. An administrator confuses the Google Groups setting for allowing out-of-domain addresses with allowing out-of-domain registration. Everyone on the Internet can read the cap table for your next funding round.

    This is a mess. It bites teams that otherwise have their ducks in a row. Cleaning it up gets way worse down the line. Get in front of it now and you probably won’t have to worry about it until someone makes you audit it, which is probably 2-3 years from now.

    Google Groups has some default configurations for new groups these days:

    • Public (Anyone in ${DOMAIN} can join, post messages, view the members list, and read the archives.)
    • Team (Only managers can invite new members, but anyone in ${DOMAIN} can post messages, view the members list, and read the archives.)
    • Announcement-only (Only managers can post messages and view the members list, but anyone in ${DOMAIN} can join and read the archives.)
    • Restricted (Only managers can invite new members. Only members can post messages, view the members list, and read the archives. Messages to the group do not appear in search results.)

    This is good but doesn’t mean you’re out of the woods:

    • These are just defaults for access control settings. Once a group is created, you get to deal with the combinatorial explosion of options. Most of them don’t really make sense. You probably don’t know when someone messes with the group, though.
    • People rarely document intent in the group description (or anywhere for that matter). When a group deviates, you have no idea if it was supposed to.
    • “Team” lets anyone in the domain read. That doesn’t cover “nosy manager” or “password reset” scenarios.

    Auditing this is kind of a pain. The UI is slow and relevant controls are spread across multiple pages. Even smallish companies end up with dozens of groups. The only way we’ve found to make this not suck is by using the GSuite Admin SDK and that’s a liberal definition of “not suck”.

    You should have a few archetypes of groups. Put the name in the group itself, because that way the expected audience and access control is obvious to users and auditors alike. Here are some archetypes we’ve found:

    • Team mailing lists, should be called xyzzy-team@${DOMAIN}. Only has team members, no external members, no self-service membership.
    • Internal-facing mailing lists, should be called xyzzy-corp@${DOMAIN}. Public self-serve access for employees, no external members, limit posting to domain members or mailing list members. These are often associated with a team, but unlike -team mailing lists anyone can join them.
    • External-facing lists. Example: contracts-inbound@${DOMAIN}. No self-serve access, no external members, but anyone can post.
    • External member lists (e.g. boards, investors): board-ext@${DOMAIN}. No self-serve access, external members allowed, members and either members or anyone at the domain can post.

    PS: Groups can let some users post as the group. I haven’t ran a phishing exercise that way, but I’m guessing an email appearing to legitimately come from board@company.com is going to be pretty effective.

  • There Will Be WireGuard

    Amidst the hubbub of the Efail PGP/SMIME debacle yesterday, the WireGuard project made a pretty momentous announcement: a MacOS command line version of the WireGuard VPN is now available for testing, and should stabilize in the coming few months. I’m prepared to be wrong, but I think that for a lot of young tech companies, this might be the biggest thing to happen to remote access in decades.

    WireGuard is a modern, streamlined VPN protocol that Jason Donenfeld developed based on Trevor Perrin’s Noise protocol framework. Imagine a VPN with the cryptographic sophistication of Signal Protocol and you’re not far off. Here are the important details:

    WireGuard is orders of magnitude smaller than the IPSEC or OpenVPN stacks. On Linux, the codebase is something like 4500 lines. It’s designed to be simple and easy to audit. Simplicity and concision are goals of the whole system, from protocol to implementation. The protocol was carefully designed to make it straightforward to implement without dynamic memory allocation, eliminating whole classes of memory lifecycle vulnerabilities. The crypto underpinning WireGuard is non-negotiably DJB’s ChaPoly stack, eliminating handshake and negotiation vulnerabilities.

    WireGuard is fast; faster than strongSwan or OpenVPN.

    WireGuard is extremely simple to configure. In fact, it may be pretty close to the platonic ideal of configurability: you number both end of the VPN, generate keypairs, and point the client at the server, and you’re done.

    Linux people have had WireGuard for many months now (WireGuard is so good that team members here at Latacora used to run Linux VMs to get it). But the most important use case for VPNs for startups is to get developers access to cloud deployment environments, and developers use MacOS, which made it hard to recommend.

    Not for much longer.

    It’s a little hard to overstate how big a deal this is. strongSwan and OpenVPN are two of the scariest bits of infrastructure startups operate for themselves. Nobody trusts either codebase, or, for that matter, either crypto protocol. Both are nightmares to configure and manage. As a result, fewer people set up VPNs than should; a basic building block of secure access management is hidden away.

    We’re enthusiastic about WireGuard and think startups should look into adopting it as soon as is practicable. It’s simple enough to set up that you can just run it alongside your existing VPN infrastructure until you’re comfortable with it.

    Death to SSH over the public Internet. Death to OpenVPN. Death to IPSEC. Long live WireGuard!

  • Dumb Security Questionnaires

    It’s weird to say this but a significant part of the value we provide clients is filling out Dumb Security Questionnaires (hereafter DSQs, since the only thing more irritating than a questionnaire is spelling “questionnaire”).

    Daniel Meiessler compains about DSQs, arguing that self-assessment is an intrinsically flawed concept.

    Meh. I have bigger problems with them.

    First, most DSQs are terrible. We get on calls with prospective clients, tell them “these DSQs were all first written in the early 1990s and lovingly handed down from generation to generation of midwestern IT secops staff. Oh, how clients laugh and laugh. But, not joking. That’s really how those DSQs got written.

    You can tell, because they ask insane questions. “Document your intrusion detection deployment.” It’s 2018. Nobody is deploying RealSecure sensors anymore. I don’t need the most recent signatures for CVE-2017-8682. My threat actors aren’t exploiting Windows font library bugs on AWS VPCs.

    This seems super obvious but: we meet companies all the time that got a DSQ and then went and deployed a bunch of IDS sensors, or set up automatic web security scanners, or, God help us, tried to get a WAF up and running.

    So that is a reason DSQs are bad.

    Another reason is that they’re mostly performative. Here’s a timeless DSQ question: “provide detailed network maps of all your environments”. Ok, network maps can be useful. But what is the DSQ owner doing with that information? You could draw pretty much anything. Connect your VPCs in ways that make them spell bad words. Nobody is carefully following sources and sinks and looking for vulnerabilities. The point of that question is: “are you sophisticated enough to draw a network map”.

    The Vendor Security Alliance is an armada of high-status tech companies — Atlassian, Square, Dropbox, for some reason GoDaddy, a bunch of other companies, and Twitter. The purpose of the VSA was to build the ultimate DSQ. A lot of the VSA companies have excellent security teams. How I know that is, the VSA’s DSQ is full of performative questions I suspect were bikeshedded in by, for example, the one engineer at Docker who one time had to write a document called “Docker Encryption Standard” and now every vendor being hazed by the VSA has to provide their own Encryption Standard.

    I do not believe the VSA is qualified to evaluate Encryption Standards! Well: Square, maybe. But I don’t think Square is the reason that question is there.

    This brings me to my third and final problem with DSQs, which is that they’re based on a broken premise. That premise is: “most vendors have security teams that look something like the security team at Square”.

    How I know that is, the VSA wants you to explain how you: segmented your network for least-privilege, enrolled all your application secrets in a secret-sharing scheme, run a bug bounty program, have data loss prevention running on all your endpoints, actively prevent OWASP-style web vulnerabilities, have a vulnerability management program tied into patch triage, run security awareness training, have a full complement of ISO-aspirational security policy documents, have a SAML SSO system, use 2FA, have complete classification of all the data you handle, an IR policy, and, of course, “threat modeling” as part of your SDLC.

    Except for threat modeling, these are all good things. But come on.

    I spent 10 years as an app pentester. I’ve killed women and children. I’ve killed just about everything that walks or crawled at one time or another. And I’m here to tell you: most banks would have a hard time checking all the boxes in the VSA. And they spend 8-9 digits annually on infosec programs. (I’m just guessing about that).

    Technology vendors — young SaaS companies — are not banks, nor are they Square. A lot of the VSA DSQ items would be silly for them to attempt to do seriously.

    What are the 5 most likely ways a SaaS company is going to get owned up?

    1. A developer is going to leave an AWS credential somewhere an attacker can find it.
    2. An employee password is going to get credential-stuffed into an admin interface.
    3. A developer is going to forget how to parameterize an ORDER BY clause and introduce an SQLI vulnerability.
    4. A developer is going to set up a wiki or a Jenkins server on an EC2 instance with a routable IP and an open security group.
    5. I’m sure there’s a 5th way but I’m drawing a blank.

    Someone — and I am not volunteering — should write the DSQ that just nails these basic things. 10 questions, no diagrams.

  • Cryptographic Right Answers

    We’re less interested in empowering developers and a lot more pessimistic about the prospects of getting this stuff right.

    There are, in the literature and in the most sophisticated modern systems, “better” answers for many of these items. If you’re building for low-footprint embedded systems, you can use STROBE and a sound, modern, authenticated encryption stack entirely out of a single SHA-3-like sponge constructions. You can use NOISE to build a secure transport protocol with its own AKE. Speaking of AKEs, there are, like, 30 different password AKEs you could choose from.

    But if you’re a developer and not a cryptography engineer, you shouldn’t do any of that. You should keep things simple and conventional and easy to analyze; “boring”, as the Google TLS people would say.

    (This content has been developed and updated by different people over a decade. We’ve kept what Colin Percival originally said in 2009, Thomas Ptacek said in 2015, and what we’re saying in 2018 for comparison. If you’re designing something today, just use the 2018 Latacora recommendation.)

    Cryptographic Right Answers

    Encrypting Data

    Percival, 2009: AES-CTR with HMAC.

    Ptacek, 2015: (1) NaCl/libsodium’s default, (2) ChaCha20-Poly1305, or (3) AES-GCM.

    Latacora, 2018: KMS or XSalsa20+Poly1305

    You care about this if: you’re hiding information from users or the network.

    If you are in a position to use KMS, Amazon’s (or Google’s) Hardware Security Module time share, use KMS. If you could use KMS but encrypting is just a fun weekend project and you might be able to save some money by minimizing your KMS usage, use KMS. If you’re just encrypting secrets like API tokens for your application at startup, use SSM Parameter Store, which is KMS. You don’t have to understand how KMS works.

    Otherwise, what you want ideally is “AEAD”: authenticated encryption with additional data (the option for plaintext authenticated headers).

    The mainstream way to get authenticated encryption is to use a stream cipher (usually: AES in CTR mode) composed with a polynomial MAC (a cryptographic CRC).

    The problem you’ll run into with all those mainstream options is nonces: they want you to come up with a unique (usually random) number for each stream which can never be reused. It’s simplest to generate nonces from a secure random number generator, so you want a scheme that makes that easy.

    Nonces are particularly important for AES-GCM, which is the most popular mode of encryption. Unfortunately, it’s particularly tricky with AES-GCM, where it’s just-barely-but-maybe-not-quite on the border of safe to use random nonces.

    So we recommend you use XSalsa20-Poly1305. This is a species of “ChaPoly” constructions, which, put together, are the most common encryption constructions outside of AES-GCM. Get XSalsa20-Poly1305 from libsodium or NaCl.

    The advantage to XSalsa20 over ChaCha20 and Salsa20 is that XSalsa supports an extended nonce; it’s big enough that you can simply generate a big long random nonce for every stream and not worry about how many streams you’re encrypting.

    There are “NMR” or “MRAE” schemes in the pipeline that promise some degree of security even if nonces are mishandled; these include GCM-SIV (all the SIVs, really) and CAESAR-contest-finalist Deoxys-II. They’re interesting, but nobody really supports or uses them yet, and with an extended nonce, the security win is kind of marginal. They’re not boring. Stay boring for now.

    Avoid: AES-CBC, AES-CTR by itself, block ciphers with 64-bit blocks — most especially Blowfish, which is inexplicably popular, OFB mode. Don’t ever use RC4, which is comically broken.

    Symmetric key length

    Percival, 2009: Use 256-bit keys.

    Ptacek, 2015: Use 256-bit keys.

    Latacora, 2018: Go ahead and use 256 bit keys.

    You care about this if: you’re using cryptography.

    But remember: your AES key is far less likely to be broken than your public key pair, so the latter key size should be larger if you’re going to obsess about this.

    Avoid: constructions with huge keys, cipher “cascades”, key sizes under 128 bits.

    Symmetric “Signatures”

    Percival, 2009: Use HMAC.

    Ptacek, 2015: Yep, use HMAC.

    Latacora, 2018: Still HMAC.

    You care about this if: you’re securing an API, encrypting session cookies, or are encrypting user data but, against medical advice, not using an AEAD construction.

    If you’re authenticating but not encrypting, as with API requests, don’t do anything complicated. There is a class of crypto implementation bugs that arises from how you feed data to your MAC, so, if you’re designing a new system from scratch, Google “crypto canonicalization bugs”. Also, use a secure compare function.

    If you use HMAC, people will feel the need to point out that SHA3 (and the truncated SHA2 hashes) can do “KMAC”, which is to say you can just concatenate the key and data and hash them and be secure. This means that in theory HMAC is doing unnecessary extra work with SHA-3 or truncated SHA-2. But who cares? Think of HMAC as cheap insurance for your design, in case someone switches to non-truncated SHA-2.

    Avoid: custom “keyed hash” constructions, HMAC-MD5, HMAC-SHA1, complex polynomial MACs, encrypted hashes, CRC.

    Hashing algorithm

    Percival, 2009: Use SHA256 (SHA-2).

    Ptacek, 2015: Use SHA-2.

    Latacora, 2018: Still SHA-2.

    You care about this if: you always care about this.

    If you can get away with it: use SHA-512/256, which truncates its output and sidesteps length extension attacks.

    We still think it’s less likely that you’ll upgrade from SHA-2 to SHA-3 than it is that you’ll upgrade from SHA-2 to something faster than SHA-3, and SHA-2 still looks great, so get comfortable and cuddly with SHA-2.

    Avoid: SHA-1, MD5, MD6.

    Random IDs

    Percival, 2009: Use 256-bit random numbers.

    Ptacek, 2015: Use 256-bit random numbers.

    Latacora, 2018: Use 256-bit random numbers.

    You care about this if: you always care about this.

    From /dev/urandom.

    Avoid: userspace random number generators, the OpenSSL RNG, havaged, prngd, egd, /dev/random.

    Password handling

    Percival, 2009: scrypt or PBKDF2.

    Ptacek, 2015: In order of preference, use scrypt, bcrypt, and then if nothing else is available PBKDF2.

    Latacora, 2018: In order of preference, use scrypt, argon2, bcrypt, and then if nothing else is available PBKDF2.

    You care about this if: you accept passwords from users or, anywhere in your system, have human-intelligible secret keys.

    But, seriously: you can throw a dart at a wall to pick one of these. Technically, argon2 and scrypt are materially better than bcrypt, which is much better than PBKDF2. In practice, it mostly matters that you use a real secure password hash, and not as much which one you use.

    Don’t build elaborate password-hash-agility schemes.

    Avoid: SHA-3, naked SHA-2, SHA-1, MD5.

    Asymmetric encryption

    Percival, 2009: Use RSAES-OAEP with SHA256 and MGF1+SHA256 bzzrt pop ffssssssst exponent 65537.

    Ptacek, 2015: Use NaCl/libsodium (box / crypto_box).

    Latacora, 2018: Use Nacl/libsodium (box / crypto_box).

    You care about this if: you need to encrypt the same kind of message to many different people, some of them strangers, and they need to be able to accept the message asynchronously, like it was store-and-forward email, and then decrypt it offline. It’s a pretty narrow use case.

    Of all the cryptographic “right answers”, this is the one you’re least likely to get right on your own. Don’t freelance public key encryption, and don’t use a low-level crypto library like OpenSSL or BouncyCastle.

    Here are several reasons you should stop using RSA and switch to elliptic curve:

    • RSA (and DH) drag you towards “backwards compatibility” (ie: downgrade-attack compatibility) with insecure systems.
    • RSA begs implementors to encrypt directly with its public key primitive, which is usually not what you want to do
    • RSA has too many knobs. In modern curve systems, like Curve25519, everything is pre-set for security.

    NaCl uses Curve25519 (the most popular modern curve, carefully designed to eliminate several classes of attacks against the NIST standard curves) in conjunction with a ChaPoly AEAD scheme. Your language will have bindings (or, in the case of Go, its own library implementation) to NaCl/libsodium; use them. Don’t try to assemble this yourself. Libsodium has a list.

    Don’t use RSA.

    Avoid: Systems designed after 2015 that use RSA, RSA-PKCS1v15, RSA, ElGamal, I don’t know, Merkle-Hellman knapsacks? Just avoid RSA.

    Asymmetric signatures

    Percival, 2009: Use RSASSA-PSS with SHA256 then MGF1+SHA256 in tricolor systemic silicate orientation.

    Ptacek, 2015: Use Nacl, Ed25519, or RFC6979.

    Latacora, 2018: Use Nacl or Ed25519.

    You care about this if: you’re designing a new cryptocurrency. Or, a system to sign Ruby Gems or Vagrant images, or a DRM scheme, where the authenticity of a series of files arriving at random times needs to be checked offline against the same secret key. Or, you’re designing an encrypted message transport.

    The allegations from the previous answer are incorporated herein as if stated in full.

    The two dominating use cases within the last 10 years for asymmetric signatures are cryptocurrencies and forward-secret key agreement, as with ECDHE-TLS. The dominating algorithms for these use cases are all elliptic-curve based. Be wary of new systems that use RSA signatures.

    In the last few years there has been a major shift away from conventional DSA signatures and towards misuse-resistent “deterministic” signature schemes, of which EdDSA and RFC6979 are the best examples. You can think of these schemes as “user-proofed” responses to the Playstation 3 ECDSA flaw, in which reuse of a random number leaked secret keys. Use deterministic signatures in preference to any other signature scheme.

    Ed25519, the NaCl/libsodium default, is by far the most popular public key signature scheme outside of Bitcoin. It’s misuse-resistant and carefully designed in other ways as well. You shouldn’t freelance this either; get it from NaCl.

    Avoid: RSA-PKCS1v15, RSA, ECDSA, DSA; really, especially avoid conventional DSA and ECDSA.

    Diffie-Hellman

    Percival, 2009: Operate over the 2048-bit Group #14 with a generator of 2.

    Ptacek, 2015: Probably still DH-2048, or Nacl.

    Latacora, 2018: Probably nothing. Or use Curve25519.

    You care about this if: you’re designing an encrypted transport or messaging system that will be used someday by a stranger, and so static AES keys won’t work.

    The 2015 version of this document confused the hell out of everyone.

    Part of the problem is that our “Right Answers” are a response to Colin Percival’s “Right Answers”, and his included a “Diffie-Hellman” answer, as if “Diffie-Hellmanning” was a thing developers routinely do. In reality, developers simply shouldn’t freelance their own encrypted transports. To get a sense of the complexity of this issue, read the documentation for the Noise Protocol Framework. If you’re doing a key-exchange with DH, you probably want an authenticated key exchange (AKE) that resists key compromise impersonation (KCI), and so the primitive you use for DH is not the only important security concern.

    But whatever.

    It remains the case: if you can just use NaCl, use NaCl. You don’t even have to care what NaCl does. That’s the point of NaCl.

    Otherwise: use Curve25519. There are libraries for virtually every language. In 2015, we were worried about encouraging people to write their own Curve25519 libraries, with visions of Javascript bignum implementations dancing in our heads. But really, part of the point of Curve25519 is that the entire curve was carefully chosen to minimize implementation errors. Don’t write your own! But really, just use Curve25519.

    Don’t do ECDH with the NIST curves, where you’ll have to carefully verify elliptic curve points before computing with them to avoid leaking secrets. That attack is very simple to implement, easier than a CBC padding oracle, and far more devastating.

    The 2015 document included a clause about using DH-1024 in preference to sketchy curve libraries. You know what? That’s still a valid point. Valid and stupid. The way to solve the “DH-1024 vs. sketchy curve library” problem is, the same as the “should I use Blowfish or IDEA?” problem. Don’t have that problem. Use Curve25519.

    Avoid: conventional DH, SRP, J-PAKE, handshakes and negotiation, elaborate key negotiation schemes that only use block ciphers, srand(time()).*

    Website security

    Percival, 2009: Use OpenSSL.

    Ptacek, 2015: Remains: OpenSSL, or BoringSSL if you can. Or just use AWS ELBs

    Latacora, 2018: Use AWS ALB/ELB or OpenSSL, with LetsEncrypt

    You care about this if: you have a website.

    If you can pay AWS not to care about this problem, we recommend you do that.

    Otherwise, there was a dark period between 2010 and 2016 where OpenSSL might not have been the right answer, but that time has passed. OpenSSL has gotten better, and, more importantly, OpenSSL is on-the-ball with vulnerability disclosure and response.

    Using anything besides OpenSSL will drastically complicate your system for little, no, or even negative security benefit. So just keep it simple.

    Speaking of simple: LetsEncrypt is free and automated. Set up a cron job to re-fetch certificates regularly, and test it.

    Avoid: offbeat TLS libraries like PolarSSL, GnuTLS, and MatrixSSL.

    Client-server application security

    Percival, 2009: Distribute the server’s public RSA key with the client code, and do not use SSL.

    Ptacek, 2015: Use OpenSSL, or BoringSSL if you can. Or just use AWS ELBs

    Latacora, 2018: Use AWS ALB/ELB or OpenSSL, with LetsEncrypt

    You care about this if: the previous recommendations about public-key crypto were relevant to you.*

    It seems a little crazy to recommend TLS given its recent history:

    • The Logjam DH negotiation attack
    • The FREAK export cipher attack
    • The POODLE CBC oracle attack
    • The RC4 fiasco
    • The CRIME compression attack
    • The Lucky13 CBC padding oracle timing attack
    • The BEAST CBC chained IV attack
    • Heartbleed
    • Renegotiation
    • Triple Handshakes
    • Compromised CAs
    • DROWN (though personally we’re warped and an opportunity to play with attacks like DROWN would be in our “pro” column)

    Here’s why you should still use TLS for your custom transport problem:

    • In custom protocols, you don’t have to (and shouldn’t) depend on 3rd party CAs. You don’t even have to use CAs at all (though it’s not hard to set up your own); you can just use a whitelist of self-signed certificates — which is approximately what SSH does by default, and what you’d come up with on your own.
    • Since you’re doing a custom protocol, you can use the best possible TLS cipher suites: TLS 1.2+, Curve25519, and ChaPoly. That eliminates most attacks on TLS. The reason everyone doesn’t do this is that they need backwards-compatibility, but in custom protocols you don’t need that.
    • Many of these attacks only work against browsers, because they rely on the victim accepting and executing attacker-controlled Javascript in order to generate repeated known/chosen plaintexts.

    Avoid: designing your own encrypted transport, which is a genuinely hard engineering problem; using TLS but in a default configuration, like, with “curl”; using “curl”, IPSEC.

    Online backups

    Percival, 2009: Use Tarsnap.

    Ptacek, 2015: Use Tarsnap.

    Latacora, 2018: Store PMAC-SIV-encrypted arc files to S3 and save fingerprints of your backups to an ERC20-compatible blockchain.

    You care about this if: you bother backing things up.

    Just kidding. You should still use Tarsnap.

subscribe via RSS