Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Li correctly points out that the Archive's budget, in the range of $25-30M/year, is vastly lower than any comparable website: By owning its hardware, using the PetaBox high-density architecture, avoiding air conditioning costs, and using open-source software, the Archive achieves a storage cost efficiency that is orders of magnitude better than commercial cloud rates.

That’s impressive. Wikipedia spends $185m per year and the Seattle public library spends $102m. Maybe not comparable exactly, but $30m per year seems inexpensive for the memory of the world…



I think the culture is one of 'we are doing this for all humankind' and when you get just a few smart people bought in on that level of commitment and they're trying to be lean (and also for sure underpaying themselves compared to what they might make at Big Tech) then you can get impressive results.


I look at the 1990s picture of Brewster Kahle and think: He surely didn't get paid as much as me, but what did I do? Play insignificant roles in various software subscription services, many of which are gone now. And what did he do? Held on to an idea for decades.

The combined value of The Internet Archive -- whether we think just the infrastructure, just the value of the data, or the actual utility value to mankind -- vastly outperforms an individual contributor's at almost every well-paying internet startup. At the simple cost of not getting to pocket that value.

I wish I believed in something this much.


If you think that's fucked up, do you know how little we pay teachers? Especially preschool-K? Clearly money is just a metric for how much moneying the money had been able to money. Goodhart out it another way: "When a measure becomes a target, it ceases to be a good measure.


1k teachers in Arizona have quit in the last six months because of this.

Over 1,000 Arizona teachers resigning plays a part in shortage - https://news.ycombinator.com/item?id=46728151 - January 2026


I was a CS teacher for the past two years, so yes. I did it for quality of life reasons while my son learned to walk. But I almost doubled my salary going back to being a software dev.


You can trade off cloud costs for developer time.

AWS is priced as if your alternative was doing everything in house, with Silicon Valley salaries. If your goal isn't "go to market quickly and make sure our idea works, no matter the cost", it may not be the right fit for you. If you're a solo developer, non-profit, or another organization with excess volunteer time and little money, you can very often do what AWS does for a fraction of the cost.


I've found that for data-intensive workloads it isn't just a trade-off—the markup on egress and storage often makes the business model mathematically unviable. I'm bootstrapping a service with heavy image generation and the unit economics simply don't work on AWS.


aren't we told all the time though, that a board of directors beholden to shareholders and a god given edict to make numbers go up are the only way to do things efficiently, to be lean and productive? are you telling me that when people find there's a need for something to happen, they make it happen? for the good of mankind? no billionaires?


[flagged]


it's literally the BS we're given for privatisation. Here in the UK, the train network is shittier than ever and there's no competition. the water companies are literally pouring shit into the sea while paying themselves billions in dividends and putting the companies in massive debt.

we were told the profit motive and competition would make them efficient.


They have! They're way more efficient at making their owners rich. Before, there was this whole "having to provide a service" thing that cost money and drove down the efficiency of moving money from the public to their wallets. Now, it's way better!


> we were told the profit motive and competition would make them efficient.

They believe their own propaganda unfortunately.


I just find it odd that people would still defend them like this. They don't seem to realise you can buy boot polish in tins nowadays. but maybe they like the fresh taste of dirt on it.


Indeed, this was taught to me in the late-90s in A-level Economics as absolute undisputed fact. The path forward had become clear whereas it hadn't been understood previously. It annoys me now, looking back and knowing it's such an incredibly naive take on how capitalism works. Was it naivety on the part of the teacher, or propaganda slipped into the curriculum? I don't know.

A separate issue worth mentioning is that the water companies (as opposed to trains, gas, electricity, Royal Mail, etc) don't fall under this because they were privatised as regional monopolies. The government didn't even (pretend to) attempt to create competition.


the thing is, even the rail companies don't usually have competition - they each carve up one part of the country or one particular line and all the prices are exhorbitant. it shouldn't be cheaper to fly across the country.


> But if all we needed was to hold hands and sing kumbaya then Africa would be Wakanda.

Are you of the impression that the problems African nations are facing is that they're holding hands and singing too much? Are the Africans just lazy?


Wikipedia is not a pure hosting operation, it's trying to foster a worldwide community-of-practice of volunteer contributors that can be sustainable in the long term, and that does take quite a bit of spending. I have no idea why so many people keep getting this wrong.


> "I have no idea why so many people keep getting this wrong."

To me it seems a perfectly natural effect of nearly everyone using it as a website which holds lots of information, and very few people comparatively have any experience with the community side, so people assume that what they see is what Wikipedia is.

Not many people are spending time reading reports on organisation costs breakdowns for Wikipedia, so the only way they'd know is if someone like you actively tells them. I personally also assumed server costs were the vast majority, with legal costs a probable distant second - but your comment has inspired me to actually go and look for a breakdown of their spending, so thanks.

Edit: FY24-25, "infrastructure" was just 49.2% of their budget - from https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_...


Wikipedia is also uniquely cacheable.

I suspect that 95+% of visits to Wikipedia don't actually require them to run any PHP code, but are instead just served from some cache, as each Wikipedia user viewing a given article (if they're not logged in) sees basically the same thing.

This is in contrast to E.G. a social network, which needs to calculate timelines per user. Even if there's no machine learning and your algorithm is "most recent posts first", there's still plenty of computation involved. Mastodon is a good example here.


The move away from "most recent posts first" is because that's actually harder at scale than the algorithmic timeline.


As a former Wikipedia admin, I think the best way to think of it as a massive text-focused battle MMORPG that happens to produce an encyclopedia as a side effect.


Yep, the encyclopedia is the not-so-wasteful "proof of work" part of the MMORPG. It's a game, but you grind it by working on generally useful stuff.


Haha and with battles in the form of massive flame wars?


> holds lots of information

But they want that information to be at least kept up to date and hopefully to improve over time, right? That's what the community is for. It's not a free lunch.


I wasn't insinuating any sort of judgement, from myself or from the vague general public that I referred to; just commenting on which parts are particularly visible & thought about.

Edit: I wasn't going to say anything, but then noticed you're the same person I was replying to before, so I will since it's more than once - in both your comments you seem to feel that you need to defend Wikipedia but in both cases there was nobody attacking them :)

I appreciate that internet comments can often contain lots of hostility, but I encourage you to remember that it's not a default state, and that often comments are just good faith opinions without an angry subtext. In both cases you could have just written as if adding some interesting information, rather than as if you're countering an anti-Wikipedia campaign. (And I'm not trying to attack or criticise now either, sorry if it comes off that way - just constructive feedback!)


The Wikimedia Foundation is a full-fledged cloud services provider. They host applications and developers on their cloud platform. These developers have been working with AI and scripted solutions for a long, long time. ClueBot is the premier example of an AI- (ML)- powered solution to combat vandalism.

So Wikipedia is not merely a "cloud app with cloud storage" but it is a first-class cloud-based platform: the English project is merely the largest and best-known, but there are hundreds, hundreds of other projects hosted on WMF's cloud services. And the developers and the bot operators who run in the backend are hardly detectable by the end-users or even the everyday editors, but they are also the backbone of WMF services, and they are supported by WMF admins and developers, to run their applications that support editors and wiki admins in their duties.


I love libraries and museums, but I think that Internet Archive has done an incredible job.

If I didn’t have a job or responsibilities and was told that I was allowed to just be curious and have fun, I would spend a tremendous amount of time just reading, listening, watching, playing, etc. on IA.

Visiting IA is the closest feeling I can get to visiting the library when I was young. The library used to be the only place where you could just read swaths of magazines, newspapers, and books, and also check out music- for free.

Also, I love random stuff. IA has digitized tape recordings that used to play in K-Mart. While Wikipedia spends time culling history that people have submitted, IA keeps it. They understand the duty they have when you donate part of human history to them, instead of some person that didn’t care about some part of history just deleting it.

IA is not just its storage and the Wayback machine, even though those things are incredible and a massive part of its value to humanity. It’s someone that just cares.

At the end of the day, big companies just need to make profit. Do big companies care about your digitized 8-track collection you have in cloud storage? One day maybe they will take it away from you to avoid a lawsuit or to get you to rent music from them.

And your local NAS and backups? Do you think your niche archive will survive a space heater safety mechanism failure, a pipe bursting, when your house is collateral damage in a war, or your accidental death? I understand wanting to keep your own copies of things just-in-case, but if you want those things to survive, why not also host them at IA if others generally would find joy or knowledge from them?


My lil NAS won't survive, but do you also believe the IA's San Francisco office will survive"the big one" when it hits the San Andreas fault? Geographically redudndant storage is the only way to do it, and that goes goes for installations big and small.


IA has redundant backups in Europe


I'm surprised no-air-conditioning datacenters aren't more common. It's a huge cost, and people love to complain about related water usage. I recall some Microsoft employees running a similar experiment years ago:

https://web.archive.org/web/20090219172931/https://blogs.msd...


I don't think it's really fair to compare IA to a real library. The Seattle public library for example spends 76% of their operating budget on employees, most of who are doing public services work. The second major expense for a real library is paying for books and materials, again IA doesn't do any of that.

It's not fair to compare an institution with a website.


I thought the comparison was unfair as well.

Physical libraries also tend to be the defacto life help desk for a lot of people out there.


They’re both institutions but one wants recognition and nice buildings the other wants to be an immutable archive unlike Wikipædia which curates and memory holes issues that don’t align with its thinking. The other one just marches on without flashy managers at the helm making life easy for themselves.


Seattle public library is also an archive as well as a provider of many beautiful and free third spaces. The downtown library is very cool. I bet there’s stuff in the stacks there that is not digitized anywhere.


The nice buildings provide a public space sheltered from the elements to millions of people a year. I love the IA but it really isn't a worthy comparison.


> Wikipedia spends $185m per year

Only a small fraction of that is spent on actually hosting the website. The rest goes into the pockets of the owners and their friends.

You can do a lot with very little if your primary goal isn't to enrich yourself.


Do you have a source for that?

Being a 503c, they're required to disclose their expenditures, among other things. CN gives them a perfect score, and the expense ratio section puts their program spend at 77.4% of the budget https://www.charitynavigator.org/ein/200049703#overall-ratin...

Worth mentioning that Wikipedia gets an order of magnitude more traffic than the Internet archive.


In their latest available annual report, the Wikimedia Foundation reported that in 2024 they brought in $185M in revenue/donations, of which they spent $178M. Of that $178M, $106M was spent on salaries and benefits, and $26M on awards and grants. So, that accounts for 75% of their spending. "Internet hosting" is listed at only $3M though there are other line items such as "Professional service expenses" at $13M that probably relate to running Wikipedia too.

Scroll down to the "Statement of activities (audited)" section:

https://wikimediafoundation.org/annualreports/2023-2024-annu...


> $106M was spent on salaries and benefits

…across 650 employees, which is $166K on average.


> Worth mentioning that Wikipedia gets an order of magnitude more traffic than the Internet archive.

With an order of magnitude less data to host, though. The entirety of Wikipedia is less than 1PB [1], while the entirety of IA is 175+ PB [2].

Traffic is relatively cheap, especially for a very cache-friendly website like Wikipedia.

[1] https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

[2] https://archive.org/about/


Wikipedia's actual hosting is not expensive and never has been.

https://wikimediafoundation.org/who-we-are/financial-reports...

If you look at the audited financial report of last year.

$3,474,785 was spent on hosting. Which makes sense its basically a static site.

This is out of expenses of $190,938,007

Thats about 1.8%. This is not new. Its been the case for years. Wikipedia has never had very high hosting costs. Its always been going into their grants or whatever else.

Despite the nonsense about AI overloading their servers even if it doubled the load it would barely affect the budget.


My countdown to donating to Wikipedia when a random MAGA nerd makes some baseless claims is getting close. When Elon had his little rant a couple of years ago it got triggered as well.


A fool and his money are soon parted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: