Article aside, EVERYONE with a marketing/technical blog should have an intro like fly.io does: a one paragraph explanation of what the service does.
It gives context for the article and immediately creates the possibility of converting people off nothing but the article, right off the gate. Blows my mind that it's one of the first times I see this done.
Just a note for people who might not be aware: whatever marketing savvy you pick up from this post, one person you should credit is HN's own Dan Gackle. Check this out:
We wrote this piece (well, "they" did; I hadn't joined up with Fly yet --- this piece is part of why I did) but Dan edited it. I got to read the back-and-forth and revisions when I got here and it was bananas how much effort he put in. In blog posts, I just keep doing what he suggested in that Launch piece. Seems to be working?
Go look at all the "Launch HN" pieces. I think a lot of them are comparably informed by Dan.
I'm commenting about this because I think it's super interesting that one of YC's secret weapons is the moderator for HN, who informally moonlights as the editor for YC launch posts, and is spooky good at it.
Completely off the track but from that Launch post:
>>> Moving server apps close to users is a simple way to decrease latency, sometimes by 80% or more.
I am willing to believe, but ... what about all the time I spend cacheing, optimising my SQL indexes, buying new RAM.
Does sending a packet a few thousand extra miles.... oh never mind I think I am answering my own question.
It would be interesting however to have a service that does latency testing for different sites from different locations. Now maybe I could use fly for that?
And the other thing they do right is put a date prominently at the top of the page. So frustrating trying to figure out when the information on a page was actually relevant without knowing when it was published.
Problem is they specifically don't want you to know it's date. Technical marketing material without a date has not expiry date. Putting a date makes content old and you need to do additional work to keep the content updated. If it doesn't have a date it never expires... Not what you and I like and want but what we get a lot of times..
I think the signalling equilibrium works. People with high-quality fresh articles put the date on to gain a temporary advantage, so now when I see an article that doesn't have a date, I become suspicious that means it is old. Like no GPA or graduation date on a resume, it isn't fooling anyone.
> the stupid and annoying teaser pictures that force me to scroll down on a 27" monitor to even see some text
I think they're called hero images[1]. I think it's okay to have them, but scaling them up with monitor size is a web-design pattern that needs to die IMHO.
Sounds like a huge red flag to me then, and loses my confidence that this service is reliable, plus it makes it a hard decision for most (even me) to go with them.
It also signals to me it is not battle tested and results in lost sales.
You're getting downvoted but I don't think that's fair. Some people want social proof, we're not going to get them as customers. There's a reason "no one ever got fired for buying AWS".
I will say that most of our technical choices are informed, and even forced, by customer workloads. Almost everything we've built is the result of "battle".
Even this post – the change to containerd + devicemapper was the result of running a tremendous number of apps AND customers with 8GB+ images who wanted fast deploys.
For some people, the technical detail gives them enough confidence to try it out. And we're totally ok just attracting those people as customers right now.
> Sounds like a huge red flag to me then, and loses my confidence that this service is reliable
Some of our clients have policy that states they won't let themselves be used as a reference site under any circumstances. That includes us mentioning them in promotional literature, except if they specifically agree to (or suggest) some sort of cross-promotion opportunity.
Not even just "some" clients for us, but nearly all. Any major "traditional" corporation or public body is not going to let you market or promote generally without giving a huge amount away. Very occasionally they won't argue our default terms for software but will prevent it in a separate NDA.
In my experience almost all references to big corporations are being done according to the "ask for forgiveness not permission" principle. So in GP's comment these companies you could also argue are flying closer to the sun according to law.
This is how it works for us. When we hire a marketing group, we'll work through the process, do case studies, etc. But it's more fun to write technical content.
All that is understood. The big question after the article was - the how is delved into so much detail.. but the why seems a little hazy. Is it just security that your customers want? What is the primary motivation of doing this? Your raison d'etre mentioned somewhere would be cool..
Edit: from their website "Deploy App Servers
Close to Your Users
Run your full stack apps (and databases!) all over the world. No ops required."
I think the thing you're looking for is subtextual. A thing you may not have noticed about serious hosting providers is that none of them run general-purpose container workloads in multitenant Docker. You have to do something like we're doing to host Docker containers for multiple organizations. Docker itself --- shared Linux kernels --- simply aren't isolated enough to protect customers from each other.
So: what we're describing isn't some breakthrough in hosting technology --- we're literally shoplifting the engine Amazon wrote to do the same thing --- but rather just explaining how we put these pieces together for our own product.
If I wanted to talk about things that were unique to Fly, I'd talk about our Anycast CDN and the way it automatically plugs into Fly apps, so that they are, out of the box, globally connected. Or about how we manage certificates, which we (I had nothing to do with it) apparently nailed the UX for so well that people use us just for that.
But that wasn't the point of this post; it really was just an effort to talk through some of the technical decisions we made.
Honestly though, what you said did entice me to start reading the article so there's also that. :)
It's OK to have a little more background before the start of the article. It's a technical audience, they'll not just take it but as you can see, some will actively ask for it.
I'm one of those weird people who thinks the world would be better if acquisitions didn't happen. Some of my best work is gone from the internet because a large company acquired it and let it rot.
Could you clarify what you mean by "I invest in people"? I found it quite fascinating but I might be misinterpreting or projecting.
Is it "I invest in people with sound business ideas", or "I invest in people whose approach has historically shown their company will do well", or is it something else?
I just mean that rather than just seek out technologies for the sake of technologies, I seek out people that I respect and see what they're tinkering with, or what they've built, and how they arrived at solutions to many of the problems that most of us face.
There are certain people whose skills and judgement I respect, that I feel I know well enough about how they reason about things (as well as you can know following them on the internet), that if they're saying this thing is good, I trust that.
One example is Yehuda Katz, who was a ruby on rails and jQuery core dev team member, now spends most of his time on Ember.js. I may not always agree with him (although it's rare) but I have found that his approach to engineering is just really well-rounded and well-reasoned. So you can totally see how he arrived at his conclusions. Like why to even put time into working on something like Ember.js, rather than joining yet another core team.
Another set of folks are the Hashicorp founders – over the years their approach to the industry and how they build out their solutions has really resonated with me. So I tend to trust just about anything they build.
Similarly, I feel someone like tptacek isn't just a security nerd but is able to articulate where the importance of security fits in with an org, with users, with businesses, with tradeoffs, etc etc etc, and seems to have a clarity in communication that I admire. Similarly, the fact that they are able to so confidently explain every aspect of their stack shows that not only are they proud of what they've accomplished, that they don't feel this is going to hurt them competitively. And the more I read about the stack, the more I keep nodding my head and just smiling at the sheer elegance in how they built it out. And I am actually someone who never really got on the Heroku bandwagon. It just never felt right.
And now I am sad because I've never met a person like you, even if many other people told me that I am exactly such a kind of guy -- a tinkerer and a critical thinker who rejects some of the status quo and is seeing it as the bad mess that it is and doesn't pretend otherwise.
I completely agree about Hashicorp btw. They are for me one perfect embodiment of the philosophy you outlined: they challenge the current preconceived notions and are severely pushing back against too much complexity.
Same for clarity in communication. Thinking about it, in my 19.5 years of career, I spent the last 3-4 much more in improving my articulation than improving my programming. The latter kind of started happening automatically due to the former. Felt quite pleasantly surprised by that.
So, thank you for existing, really. The world needs more people like you.
Fantastic writeup: technical details, context and framing, all in clear, informal, plain-English prose. It's profoundly effective, and vanishingly rare. As for Fly's solution per se: like many great ideas it seems simple bordering on obvious, in retrospect. I'm impressed and more interested than ever in bringing fly.io into my wheelhouse.
Final thoughts:
CLI-first: flyctl reminds me of `vercel` (fka Zeit `now`), which is a big compliment
> "VM build-and-boot process on a second deployment is faster than the logging that we do"
... ie, well-suited for lambdas / serverless functions at CDN edge -- which is of extreme relevance and interest to me and everyone else working on minimizing latency with modern webapp architecture... exciting times!
Pretty clever. It is pretty neat how much a small team can do by intelligently "remixing" the powerful primitives that now exist at the linux and VM/container layers.
The cool bit is how many interesting things are here/arriving like bpf, io_uring, and wireguard in linux and with wasm and all of the interesting things it opens up with fast, sandboxed execution. I fully expect from some of these techs will really come innovation that will shake-up infrastructure even more than cloud and kubernetes already have.
Indeed! It’s striking what a difference there is compared to the 70s/80s when all tech in utility computing was proprietary and owned by IBM. These days companies like Amazon, Google and Facebook provide a lot of building blocks as FOSS libraries. The innovation rate is amazing.
Everyone is praising how well written this is, but I personally am confused: what do you actually GAIN by running a docker container without docker? Is it faster, or more secure? Why?
It is comparably fast (in practical terms; "of the same general order of startup cost") and much more secure. I didn't want to belabor the security thing, because we wrote extensively about that in a previous (linked) blog post.
You can skip to the grafs immediately before and after the string "If you’re running someone else’s applications, you should probably care a lot".
I don't think you can reasonably host general-purpose applications on a multi-tenant basis on shared hardware using container systems (ie: using directly shared kernels), for reasons that post gets into. It's for the same reason that AWS wrote Firecracker to run Fargate, which is also a container hosting service.
It's my post and I don't totally get the writing thing either. My M.O. with these posts: write it like it was an HN comment, and then edit the sentences to be shorter. I'm glad people like it, though.
That sandboxing post is the best article I've read on isolation in a long time. It is an absolutely fantastic way to get caught up on the last decade of solutions.
It's super convenient in that it should eliminate the "runs on my box" sort of errors. You should be able to run the same OCI image in Docker, FC, and kata and not worry about that. Really opens you up to freedom of where you dev.
Docker makes external dependencies management better, but it's an abstraction that leaks like a sieve when it comes to networking. If your /etc/docker/daemon.json file isn't set up just so, or your iptables are a bit off, or your vpn client is a bit too aggressive (e.g., Cisco VPN), the whole thing blows up.
Personally, I would rather manage dependencies than iptables rules.
This is like saying thanks to car company with X security feature, it really doesn't make sense to have competitor without security feature. Why eliminate your options? You don't gain anything.
I think the important part is immediately after what you quoted: “using directly shared kernels”
If you’re running dockerd on one big server, you’re a single bug in either the kernel or dockerd away from someone in account A being able to attack account B.
This is also true about Firecracker or another VM approach but look at the relative exposed attack surface: if we’re on the same kernel, there are tons of different modules which I can attack — maybe even obscure ones you don’t or barely use - and if any of them allow me to run code, alter or leak memory, etc. mayhem ensues.
Running a VM reduces that risk because you’re sharing much less code & data but they traditionally added more overhead and startup delay, and if you’re deploying a traditional OS there’s a lot of management overhead. You’d see this as a security precaution but many people weren’t willing to pay for it.
Firecracker changes that calculation because creating a VM is fast enough and Docker means that you don’t need to support a full Linux distribution, just enough to launch the container image.
When most people don't understand the implications, offering a cheaper option that's almost guaranteed to inconvenience them (or be a catastrophe, depending on what they're doing) is not a fault that should be attributed directly (and certainly not solely) to them.
"don't think you can reasonably host" => I encourage our competitors to do this
"general-purpose applications on a multi-tenant basis" => any program a black hat who signed up with our automated system wants to run, next to all the other programs random people want to run
"on shared hardware" => one lump of iron owned by Fly
"using container systems" => containers vs VMs, in this case, and especially a VM manager that pays some attention to security.
Yes. The advantage of Firecracker is that it's lightweight and boots incredibly fast. It's a pretty good way to run containers. But the virtualization primitives are almost the same.
If it's using firecracker, it's probably using KVM virtualization while ensuring that the memory the VM consumes is not pinned... that is, that the VM can be swapped out of memory. For reference, firecracker was created by AWS to run and secure AWS Lambda. The hypervisor is written in rust and uses seccomp to eliminate unnecessary system calls. They open sourced it a few years back.
What you gain is a stronger security boundary. Just FYI, since 2019, you can also do this in Kubernetes using Kata containers + containerd which will happily shim firecracker. The setup is not simple though.
Overall, fly.io building infrastructure on this pattern and making it accessible is fantastic. Looking forward to seeing how this continues to evolve and am happy to see more infra build on top of firecracker. Very exciting!
Exactly what I was wondering. It's a lengthy explanation explaining how they provide a commodity service: i.e. running vanilla docker containers from vanilla docker registries) doing all sorts of clever and seemingly complicated things. But in the end what you buy is the ability to run a docker container that you publish to a docker registry that you develop and test using docker.
I'm sure there is some advantage to all that cleverness but its not at all clear what that is (cost?). So, I disagree it the article is that well written. There's a lot of what and how but not a whole lot of why that matters to their customers. It kind of fails to communicate that part. It could be as simple as "you pay less for more because we did X and this is how we did it and why that matters to you".
Another point of view would be the kernel version. If your service depends on different kernel version. It might be good to have a docker-like VM which could create by Dockerfile <- this is good for CI/CD.
But in general case, their target should be the security.
I like that it comes across as competent without sounding boastful. There's obviously pride in there, but balanced with honest commentary about parts that aren't perfect.
I agree, I'm not even that much into docker stuff, but I couldn't stop reading. It was a fun read: took me on a journey, I learnt stuff and I feel more intelligent now.
I am lucky enough to have read the drafts. These articles are a huge grind. Most of the skill seems to be putting the time in and having enough experience to go as deep as necessary.
It's funny, if you look at my drafts, most of what I do is just shortening sentences. (I don't think I'm a particularly good writer, and I'm fascinated by good writing, so this is an interesting comment, and I appreciate it).
I'd really like to use Fly for a project that is supposed to go serverless, but may never evolve out of the "Docker on EC2" phase of life. Have you had any third-party security audits done? It'd check a lot of boxes for me.
If you haven't tried Fly yet, where have you been?
- Anycast IPv4 to VMs hosted near the edge
- Raw TCP/UDP servers or external HTTP/HTTPS termination
- $.02/GB egress, no per-request pricing
You can build almost anything with this model (VoIP, video, gaming, Heroku-like, App Engine-like, Lambda-like, ..), and the bandwidth pricing is astounding. Fly have an obviously bright future, they aren't even on the same page as existing CDNs.
Is $.02/GB egress supposed to be astounding in a positive way? That's still highway robbery compared to what you get on DigitalOcean, OVH, Hetzner Cloud... literally any provider that's not AWS or GCP.
Maybe some folks have millions of VC money to blow, but I would never build a VoIP or gaming (= bandwidth intensive) app on this.
$0.02/GB is higher than we want to charge, but we have to pay the same rates you do at most providers. It's a little unfair to call it "highway robbery", though, we have to price things so we can stay alive.
We're pushing that price down pretty regularly, you can read some details about it here:
> Inflicting that complexity on you all would help with margin control, but ugh. What we've done instead is set a blended price that fits most apps running on Fly.io, and decided to just eat the extra cost from outliers. If you want to exploit that, run an app in Sydney with a whole bunch of users in India. We'll lose money on your app and you will win one round of capitalism.
These are very different kinds of bandwidth. The only provider above to offer anycast IP (AFAIK) is AWS Global Accelerator.. it's $.015/gb on top of the regular $.09ish/gb EC2 egress charge. GCE offer anycast load balancing but it is (AFAIK) HTTP only and even pricier than AWS.
With Fly you're getting the raw magic behind Lambda@Edge, CloudFlare Workers and every anycast DNS and edge reverse proxy service in existence, with a large chunk of compute, RAM and local storage to build out whatever exact combination of CloudFront or whatever else you like thrown in as part of the deal. Also if you want to skimp on backends, it's still possible to use their HTTPS termination (ala GCE) to reduce handshake latency without rearchitecting everything
I can't quite figure it if you pay the bandwidth for where your servers are, or where the people are (the first would make the most sense to me) and Indian trafic is very expensive by their measure.
But do not that you get a lot of free bandwidth to play with.
> Inflicting that complexity on you all would help with margin control, but ugh. What we've done instead is set a blended price that fits most apps running on Fly.io, and decided to just eat the extra cost from outliers. If you want to exploit that, run an app in Sydney with a whole bunch of users in India. We'll lose money on your app and you will win one round of capitalism.
Nice, but still one worry: A bug, or an attacker, sends outbound traffic through the roof, and newbie's fun experiment ends in a huge bill. More than one such story on HN recently (AWS).
Skimming the pricing docs just now, I don't see protection against bandwidth overage:
Is this more like a Lambda alternative or can you run long running processes? If so, how much disk space comes with a plan since I don't seem to be able to find that info.
Honestly the more I read these posts, the more I want to use the product. :) (Full-time elixir dev running a SaaS and 8+ projects on elixir, including covid vaccine scheduling system that handles thousands of users with no issues. LOVE elixir.)
I don't even write Elixir and I love it, because we built a system that makes it super easy to take a Dockerfile of some random thing, and deploy 20 instances of it spread evenly around some bunch of regions that can talk privately to each other by default.
That's a fun thing to be able to do in any language, but to exploit it in a Go program, I'd have to actually think, pull something like Serf in, whatever.
But Phoenix LiveView apps literally ship with a dashboard where you can see all your instances running, and then first-class features for them to chat between themselves.
LiveView is cool all on its own, but Elixir+Phoenix+LiveView is such a sweet application of a hosting environment like ours. Expect us to be weirdly chatty about Elixir in the coming months.
I'm looking forward to this. Is it going to be possible to run local Postgres (or one of the distributed dbs that are PG compatable) read instances alongside the Elixir instances?
I can't find anything about postgres pricing. Also, the section on extensions in your doc is blank. Since postgres is just a fly app, would it make sense for a user to just fork it and add extensions they needed? That would be awesome if feasible.
Yep! Our internal service discovery lets you connect to instances in the same region. If you're running in "SJC" you could use "sjc.my-postgres-cluster.internal", for example.
The general deployment story with elixir leaves a lot to be desired. I hope this is one of the things that you guys are planning to be chatty about! We currently use edeliver and it just feels clunky.
I’d love to just push a branch and type flyctl scale and stuff just ends up being clustered. (That’s probably already supported?).
Actually, I’m going to use fly for a couple of our projects. Just realized it. We’re tired of Linode and were prepping to move it all to AWS but I wasn’t looking forward to the devops maintenance for how small the projects currently are.
I actually would love to build an FaaS on top of elixir for our product, that’s ending up accidentally having low-code elements in it. The process isolation you get for nearly free seems like a no brainer for such a platform.
Another potential would be hosting Wordpress.
Having great tutorials for these use cases will be killer.
I'm surprised how well liveview copes with a flood of events. I've seen the latency indicator at the bottom spike at times, but it mostly just hums along.
Nice writeup, but another detail I'm very positively surprised about - no external resources, no 3rd party JS libraries or external tracking, not even on their landing page. This is an exceedingly rare thing these days!
This is the opposite of what I was hoping for from the title, because I'm working in kind of the opposite environment: everyone is trusted, we don't need strong isolation, but multiplexing processes onto hardware is tricky (and currently handled mainly through ansible which feels pretty hacky).
Is anyone doing orchestration without containers? I'm working on the JVM so we already have well-isolated single-file deployables, but orchestrating which of them get run on which server is still significantly tricky. I'd love to be able to use the orchestration part of something like Kubernetes without having to worry about all the extra complexity of docker/containers.
Sounds like that's oriented towards using a chroot and a cgroups namespace which is exactly the kind of thing I'm trying to avoid (given how it complicates debugging etc.). But maybe I can use their "java driver" and make sure nomad doesn't run as root?
I'm looking for "I have these machines and I have these programs (with roughly these CPU/memory requirements) and I'd like to deploy x copies of process y and z copies of process w, sort it out".
Sounds like nomad might be what you're looking for? They have good support for non-containerized workloads which ultimately just boil down to "if scheduled on this host run this process."
I've gotten a bit sick of running Docker on my dev systems and switched to Podman. I found a "podman-compose" script that works great for setting up my dev environments just like I'm used to, and I can run everything as my own, non-root user. When my laptop inevitably starts running out of space, there's a single directory of images in my home directory I can rm -rf to start over. It's been great.
We use Fly to run our website & control panel, that was previously hosted on Heroku. Not a big workload but we were able to move it regionally closer to our backend services to improve performance.
The deployment experience is just as easy as Heroku. Perhaps easier.
Accidental? I reckon those responsible are likely incapable of avoiding unnecessary complexity. Further, I think any talk of "simplicity" may make them uncomfortable.
Distributed systems at scale are hard. These technologies are coming out of the largest, most successful companies on the plant. To assume that things are complex because of some naivete or stupidity (rather than because they solve complex problems) is hubris in the highest.
It's definitely not naivete or stupidity, but projects from large companies inherit all the complexity of the organization that created them. This is sometimes because they're very broadly scoped, and sometimes because large groups of people build complex things by default.
Simplicity is important for people who are in smaller groups. This is why many people who dislike kubernetes like Fly.io, and people who love kubernetes think we're building a toy.
Fly is really nice. I’ve been getting my head around microvms recently to build some infrastructures around them. Even started building some tools to rebuild them directly from Dockerfiles and Docker images.
Similarly, I recently discovered it is possible to run Docker containers in LXC relatively easily using the provided "lxc" template. We use this to run a few pieces of software whose only supported method of distribution is Docker.
But there's a number of things that don't quite work how we want it to out of the box, for example, our DNS isn't configured. It seems fly solves this with their custom init system, so I'm really excited to play around with that! Thank you for sharing!
So by this do you mean running docker inside lxc, or do you mean something closer to what these folks have done -- unpacking the layers into lxc and using them directly?
I would be happy to, but I don't really have one. I can give you a really quick rundown of how to do it, though.
1. Install jq, umoci and skopeo on your host.
2. Run "lxc-create -n <name> -t oci -f <temp-config> -- -u docker://<link to container on registry> --no-cache"
3. Start the resulting container with "lxc-execute -d" rather than "lxc-start".
The <temp-config> file should contain a uid and gid mapping for LXC. If you don't intend to run the containers rootless, you can omit it and "--no-cache" (the caching mechanism is broken for rootless containers).
After container creation we patch the generated config file with our network configuration, etc. similarly to what we would do with a normal LXC container.
This has worked with all containers we tried it with, but if it relies on interacting with the Docker domain directly via docker.sock it obviously won't.
A colleague has had success adding the init system from a container (in that case tini) and changing the confict so the container can start with lxc-start, this was a much more manual process, though.
Thank you! The votes on my post suggest there are a few people interested in this, so I might eventually write a small blog post on this.
Running the resulting container in Proxmox was actually the reason why my colleague needed the container to start with lxc-start rather than lxc-execute. The process involves finding the init system in the container and setting up lxc.init.cmd rather than lxc.execute.cmd, which is set by lxc-create. We have not yet investigated whether it might be possible to automate this for the majority of cases, but this is something we might look at in the future.
They can with a few extra steps that are still somewhat manual. Specifically, changing the configuration such that using lxc-start rather than lxc-execute is possible. It may be possible to automate these in the future, it's something I'll eventually look at.
So I've had a question running around my brain for a while.
Has anyone tried using docker-compose files to deploy direct to separate cloud machines?
By which I mean, given some config with user creds to AWS or Digital Ocean or Hetzner or whatever, running:
dockerless-compose some.address.com up
could spin up servers (ec2 instances, droplets, maybe some kind of hinting in the docker-compose.yml file to show sizes), maybe set up a virtual network if the hosting provider supports it, set network aliases in the /etc/hosts, set some.address.com to be the gatekeeper (the system would need to install SSH on your boxes, but it means that it could set up an ssh authorized key from the gatekeeper box)
And then, maybe it doesn't need to be actual cloud servers. It could be bare metal. It could be chroots. It could even be a single host with docker (a bit pointless, but maybe useful for proving parity with docker-compose)
Essentially, suddenly we have a replacement for the likes of terraform in many organisations. Your local developer setup becomes exactly the same as your production setup.
Am I missing something? Is that where we're headed with this, or would this be a complete nightmare?
The catch is that this doesn’t replace a tool like Terraform unless your projects are extremely simple and you don’t have much in the way of security requirements. Once you need anything more, Compose doesn’t have the information you’d need and you’d lose the veneer of simplicity and portability trying to add it in.
Other than that, I think there's a reason why things like lxd, k8s (maybe joyent/smartos belong in this list) - generally have one layer for the actual hw, and a higher level abstraction on top.
Otoh, I suppose something like canonical metal-as-a-service might be able to do more with a little help. In fact I think maas+lxd is probably pretty close to being able to work with a compose-like tool and docker containers.
I believe this was docker's (the company) direction for a while, but it was spread across multiple tools. docker-machine does provisioning and docker-swarm runs the compose file on remote hosts.
This is really just a curious question, but I've been playing a lot with overlay Filesystems at work. Have you tried mounting the tars as overlays directly? (I'm not sure which driver, but to my knowledge podman does it this way) this way you wouldn't have to unpack the tarballs and also have an immutable base for the container.
Tarballs must be compressed in a very particular way to support random access, and even that only works with on-demand decompression using a slow compressor (zlib). In Fly's model this would probably also need to happen inside the container's VM.
All together it might save a little one-time startup cost, at the expense of making every other operation on the filesystem much slower for the life of each container.
I have been following microvms and kubernetes lately. Nice thing about leveraging OCI is that you can use cri-o and orchestrate using kubernetes. I am pretty bullish on microvms.
Weavework has been doing quite some good stuff around OCI interface around microvms. Ignite[1] which is more robust and productize version of what is mentioned in this blog. Other projects that should be watched out for is kata containers [2]
Who are the people behind fly? It looks to me like the first thing I've seen in awhile that actually might be "a better heroku". Anybody out there using it?
Anybody who was happy with heroku for mostly everything other than price that migrated to fly and remained happy?
I'm really impressed by fly.io, and the candidness with which they share some of their really awesome technology. Being container-first is the next step for PaaS IMO and they are ahead of the pack.
I aim to build a platform like theirs someday (probably not any time soon) but I don't think I'd do any of what they're doing -- it feels unnecessary, because I feel like I know lots of projects that have actually already done the hard work for this. I probably only think this because I haven't actually done it, but think it's worth sharing the tech anyway. Bear with me as I recently learned that they use nomad[0] and some of these suggestions are kubernetes projects but I'd love to hear why the following technologies were decided against (if they were):
- kata-containers[1] (it does the whole container -> VM flow for you, automatically) with multiple VMM options[2]
- linuxkit[3] (let's say you didn't go with kata-containers, this is another container->VM path)
- kubevirt[5] (if you just want to actually run VMs, regardless of how you built them)
- Ceph[6] for storage -- make LVM pools and just give them to Ceph, you'll get blocks, distributed filesystems (CephFS), and object gateways (S3/Swift) out of it (in the k8s space Rook manages this)
As an aside to all this, there's also LXD, which supports running "system" (user namespace isolated) containers, VMs (somewhat recent[7][8]), live migration via criu[9], management/migration of underlying filesystems, runs on LVM or zfs[10], it's basically all-in-one, but does fall behind in terms of ecosystem since everyone else is aboard the "cloud native"/"works-with-kubernetes" train.
I've basically how I plan to run a service like fly.io if I ever did -- so maybe my secret is out, but I sure would like to know just how much of this fly.io got built on (if any of it), and/or what was turned down.
When we started with Firecracker, there weren't really any projects built on top of it.
Even if they had been, though, most things that touch kubernetes are incredibly complex. We have to be very careful how we manage complexity.
For example, we decided not to use any CSI or CNI projects for our storage and networking because those are designed to be pluggable. We don't need pluggable, and it's simpler to just build the very small things we did need.
Reasonable -- thanks for expanding on this. No idea what your management infrastructure is like -- would you mind expanding a tiny bit on how you would power some of the things that CSI/CNI give you for the complexity? For example taking a disk snapshot or enforcing workload-specific network firewalling?
If I think of the choices for pre-built orchestration being:
- LXD
- Kubernetes + CSI/CNI
- Nomad + a little bit of orchestration
I think I know the answer for some of the other options, but not Nomad -- how I'd get some of the features that CSI/CNI provide. Do ya'll just run jobs that take and ship LVM volume snapshots? Is live migration supported (to be fair this isn't really easy/possible/built-in anywhere except LXD AFAIK yet), or do you do a rolling traffic-shifting and application-replica-shooting process?
Looks like I need to read up on nomad this weekend -- looks like it's got just enough for you all to productively use it and build just what you need on top. I have to admit I haven't given it enough of a read/examination.
That's exactly why we used nomad. It's focused and easy to understand, and we can build on top of it. We've even patched it on occasion.
We manage LVM ourselves, yes. We don't do any live migration. Our disk offering is relatively new so we haven't exposed much snapshot based functionality to them. Most of them just want to know backups are happening and disks are encrypted, though.
Letting customers connect to their networks with wireguard is something we probably couldn't have done well with k8s and CNI. Or at least, we'd have been fighting cni complexity to make it work.
I have to say I’m a bit puzzled at the phrase “fighting CNI complexity”.
CNI says you exec a process to connect your network and give it json on stdin, encoding what you want it to do.
I get that a function call to code you wrote is simpler, but is json+exec really “complex”?
I don't know if it's simpler, and I don't have a dog in this fight, but: our users get direct access to a standard IPv4/IPv6 network that behaves the way you'd expect those networks to work, with a private IPv6 address/hostname they can bind to for services that should only be available to applications in their own organization, and a DNS server that quickly gives you all the available private addresses of an app as `my-app-name.internal`, with no mTLS and no sidecar proxies, and we implemented this in about a thousand lines of Rust and a couple hundred lines of eBPF C.
I hear "CNI" and I think "Istio". Is there a CNI plugin that would have gotten us this stuff for less effort?
So I agree with that phrase -- I'd say that the phrase CNI complexity generally refers to at least these things for me:
- the complexity of networking in general, which is exposed. If I have to know what a CIDR/IPAM/etc are, then that complexity has leaked through.
- the near-oneness of kubernetes and CNI -- CNI seems like a small facet of kubernetes (especially early on) but choosing the right provider, knowing what that JSON means etc is a big source of complexity -- combining that with the (usually) new world of complexities that k8s wrangles can be too much.
- deciding that every pod must be addressable was a mistake IMO. It would have been harder to design but I think some sort of connection/network proxy should have been developed up front and iterated on (maybe I'm unaware of the history and this did indeed happen). When I spin up two containers, if they ought to talk to each other, I would have preferred so make some `NetworkConnection` object over the ports they need to speak over, or some virtual network they should be a part of. Security would be improved and the explicitness of connection/being present in a network could have obscured the underlying networking complexity.
- debugging CNI brings you back into the regular world of linux networking -- tcpdump/iptables/ipvs-lvs/conntrack/etc when things break down in those layers people are even more confused than regular containers since just about everyone knows top/ps maybe even strace but far less people know how tools like ip/ifconfig/etc work.
CNI is for the most part really simple and really easy to use these days, but I definitely can't say I'm confident with it -- I've been running kube-router since I found it because it was simpler than kube-proxy + flannel + calico (there's also canal). If you use some of the newer distributions (k3s, k0s) you don't even think about it.
Also not securing node<->node communication was a pretty rough decision as well IMO. The Konnectivity service is a pain to think about and deploy, but luckily for me I can sidestep the whole thing with wireguard -- and calico will do it for me. There's also Cilium out there which does similar things.
CNI is a smashing success -- people are building innovative things, competing, and clusters share the benefits -- but I still don't know that I'd call it simple yet, I just hope it never breaks.
True there is a certain bit of mis-attribution but I think others are lumping them together as well right now. It's easy for me to sit back and say it now, but I think CNI could have done more to obscure/shift away from the underlying complexity.
To lean off my armchair a bit and offer some parallel work -- I was pretty impressed with the design goals/interface of Ouroboros[0][1].
There just needs to be more projects (nomad is one of them) that are adopting CNI and simpler than k8s.
Do want to take this change to thank you for you and your teams' work on this throughout the years though, I am very grateful for the CNI existing and the foundational work ya'll have done over the years.
Weaveworks has also been supporting/crafting/pushing the limits in what networking (and the CNI of course) on Kubernetes could be from near the beginning. Was a treat to hear Alexis talk about the company on the Kubernetes podcast[0] (I just saw that part 2 is out, listening to it now!).
I looked back in my YT history and you show up there three times (lots more videos with other folks from weaveworks):
Cortex - Infinitely Scalable Prometheus[1]
How We Used Jaeger and Prometheus to Deliver Lightning-Fast User Queries[2]
Golang UK Conference 2016 - Bryan Boreham - An Actor Model in Go[3]
> We manage LVM ourselves, yes. We don't do any live migration. Our disk offering is relatively new so we haven't exposed much snapshot based functionality to them. Most of them just want to know backups are happening and disks are encrypted, though.
Reasonable -- if all the workloads run in containers to start with and are service-y probably not a real need, and certainly not a real need if no customer has asked for it/had it be a deal breaker..
> Network isolation was much simpler with a little bit of bpf and a ipv6 tricks. Some of that is detailed here: https://fly.io/blog/ipv6-wireguard-peering/
> we'd have been fighting cni complexity to make it work.
Appreciate the candid responses, thanks for taking the time. That ipv6 wireguard peering post was really fascinating I read that too. Wireguard has been quite the a game-changer in it's space as well and a lot of value IMO is just in the simplicity and difficulty of misconfiguration, even though the performance is also fantastic.
Grateful that ya'll are sharing what you're doing right/finding interesting.
Since ya'll might appreciate this, I think there's an ultimate form of all these orchestrators out there that boils everything down to the "operator pattern" -- I call it "buhzaar" but I tried to get my thoughts out of the notebook a while ago[0]. It's almost like a completely normalized DB might be -- to strip an orchestrator down to it's bare minimum, which facilitates other processes that do resource provisioning and management. Then let people bring their own things that provision resources (and maybe you some "officially supported" ones but they all live separately and iterate separately).
I didn't quite put down all the thoughts I had but you think this is too much normalization (in the same way no one wants to do 7 joins)? You could argue that both nomad and k8s are denormalized (they intrinsically "know" how to provision/manage certain things) to a certain extent, and nomad just "bundles" less.
Reading some these comments I almost started thinking about a docker2zfs tool that would take an image and build/dump a (possibly encrypted) zfs snapshot. And a service that takes a zfs snapshot, mounts it and runs it like a vm/container/jail...
TIL - I did not know this about Nomad, but I assumed they weren't using CSI or CNI because they mentioned managing LVM themselves in the article (and in this/previous article they mention bpf for networking so definitely not using CNI), so with full context the question would be "how are you providing the functionality without relying on any of the pre-built ecosystem stuff listed above?".
Nomad allows you to rely on the host OS'+task driver primitives for storage and networking ( e.g. mount a docker volume, use the docker bridge network, mount a host folder, get an IPv6 etc.) or pass through CSI and CNI for those things for mose complex scenarios.
>As batshit as this plan is, it works surprisingly well; you can build gVisor and runsc, its container runtime, relatively easily. Once you have runsc installed, it will run Docker containers for you. After reading the code, I sort of couldn’t believe it was working as well as it did, or, if it was, that it was actually using the code I had read. But I scattered a bunch of panic calls across the codebase and, yup, that all that stuff is actually happening. It’s pretty amazing.
>You are probably strictly better off with gVisor than you are with a tuned Docker configuration, and I like it a lot. The big downside is performance; you’ll be looking at a low-double-digits percentage hit, degrading with I/O load. Google runs this stuff at scale in GCE; you can probably get away with it too. If you’re running gVisor, you should brag about it, because, again, gVisor is pretty bananas.
Just curious, what kind of host OS is Fly running for the machines hosting the VMs? A custom immutable image built with something like LinuxKit? A stock OS designed to be a container host, like Flatcar Linux? Or a conventional distro?
I need docker in docker. I’ve already virtualized at the bios and I’m running xen within xen. The only thing left is to run virtual box and then dosbox.
From a Dockerfile, it's not as simple without creating an image first.
The command @tptacek means is most likely docker export for a running container and docker save for an image. First exports selected paths, second exports a complete image as a tgz archive.
Our local dev environment does this, but it does it by running our driver (so we can test the driver) so it's a lot more mechanism than you'd need.
You could get this done yourself pretty easily: just unpack the Docker image (there's a Docker command that does this, I forget what) onto a mounted loop device, then hand its block device off to Firecracker.
Our driver does its own local address goop, but it's not complicated (it just dumps netlink and allocates the next available subnet in a particular range).
Having said all that: there are projects that do this already; for instance, I think there's a Kata Firecracker somewhere.
TLDR: they run containerd instead of docker daemon to deploy docker images to firecracker VMs. GCP has a somewhat similar “container on VM” product (though not on firecracker VMs, just GCE VMs) https://cloud.google.com/compute/docs/containers/deploying-c....
It's somewhere between Heroku and DigitalOcean. We have a CLI for managing and deploying apps that's very similar to Heroku. But apps also get private networking, disks, load balancing for TCP/UDP.
We do have people deploy single instance apps and use them just like you'd use a Droplet. "fly ssh console" will SSH you directly into them, and with a persistent volume you can do all kinds of fun stuff.
Haven't heard about them, but its more like Heroku - not pure VMs but instead IaaS where you deploy apps using CLI and they manage/scale machines for you.
The parent comment is getting downvoted into oblivion (because, again, we're a hosting provider) but it's an honest observation! It just happens to have a simple answer.
All these contortions and complications, instead of just learning how to make pkgsrc packages, going to SmartOS and running in zones. Unbelievable, how needlessly complex this is. Awful.
Annatar's saying there's a solution out there with fewer moving parts than the industry is moving towards. Rather than the dropbox situtaion that was the opposite.
I'm not that familiar with any of these, but SmartOS sounded cool when I looked it up.
My point on the other hand is that all that flapping in the fly.io article is too complex and completely unnecessary because that problem can be solved with a better, simpler solution which has existed far longer.
I also dislike docker. But it's a de facto standard. We have exactly one thing we want to "innovate": running app servers close to users.
Which means we support what people already use. Postgres is great but I wouldn't choose to run it for fun. Our customers use Postgres. Node is not my favorite runtime, but we have good support for Node because it's popular. Devs are willing to package their apps as docker images. So we run Docker images.
Your better, simpler solution would cause our company to fail. I'm very familiar with Joyent and SmartOS. They lost the container battle, despite the elegance.
Nothing is lost. If you are IT professionals, it's your job to advise clients what to run, not the other way around. I know that when it comes to computers, everyone feels like they have something to say, but it's just like any other profession: we are the experts that went to university and have the requisite degrees stemming from our education, we are the experts with the help of our professional experience. Therefore we should be the leading force for our clientele, not the other way around. Customer is king, but that does not preclude or exclude professionals taking the leading role. If your sole business plan revolves around service, banking on a single piece of specific technology, your business is already doomed, as was every IT business before it. History is the teacher of life, and those who do not learn from history stand to repeat it, poorly and with dire consequences.
Docker's days are numbered: too many scandals, Google gave up on it. It is only a function f(t) before it goes out of fashion, just like everything else in IT.
Post scriptum: why persist on the illumos-zones-ZFS-configuration management with OS packaging constellation? Because it is bullet proof not for months or years, but decades thereafter. It is good business: less maintenance investment means more profit long term, and in business, it is the long game, and not trend pandering which counts. No business survived on trend pandering, for reasons which should be quite obvious.
You already unpack Docker, and when I read what all you had to do it was obvious to me that what you are doing is far more expensive than if you had gone the SmartOS-lx branded zone-ZFS route; you could have chosen to do so with the bullet proof substrate that are the lx-brnaded zones, the illumos kernel, and ZFS. The customer in your model is none the wiser anyway but you did not do so, which is illogical, since you take it all apart and custom-deploy it on custom infrastructure anyway. The system engineering and maintenance costs will kill your business long term, not to mention what they will do to each and every one working on that in terms of health and personal life. Babysitting computers and software will never be a profitable position.
It gives context for the article and immediately creates the possibility of converting people off nothing but the article, right off the gate. Blows my mind that it's one of the first times I see this done.