Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How much your computer can do in a second (2015) (computers-are-fast.github.io)
288 points by MindGods on July 11, 2020 | hide | past | favorite | 180 comments


Computers are also slow and sometimes getting slower, because real world applications are built with increasing layers of abstractions. while the hardware in the bottom layer is getting faster, the top layer is sometimes slower because there are so many layers and so much complexity in the stack.

Take this review of terminal emulator benchmarks: https://lwn.net/Articles/751763/

Uxterm, released in 1994 is the clear winner.

A "modern" terminal like Gnome Terminal has more than 10x the latency.

Booted an Electron app lately?


A few extra seconds of app startup time or an extra frame of latency in your terminal is a small price to pay for the absolutely massive difference in functionality of newer apps.

It’s fun to see how fast apps were when everything was extremely simple, but you can’t dismiss the fact that it was orders of magnitude less capable than modern software.

The bottom line is that modern apps are fast enough. I’ve never once thought that my terminal latency is too slow or that a few extra seconds of app startup time made a any impact on my day whatsoever.

The difference between the fastest and slowest terminals on that list is less than 30ms, or 3/100ths of a second. Literally less than a blink of an eye. It’s a fun study and enjoyed the article, but the reality is that it just doesn’t matter at all.

Anyone suggesting that computers felt faster a decade ago is misremembering the past. Try using a 2008-era computer with a mechanical HDD and you’ll quickly realize that apps did not launch faster back in the day.


Modern programs are 100 times slower, but they definitely aren't 100 times more functional. And speed is a part of functionality and UX! I don't want to have to use a high-powered workstation or a gaming laptop to get reasonable responsiveness in my text editor. Many Javascript-powered websites with 'fancy' canvas features are janky and slow as shit on hardware that's only 10 years old, where as websites that use HTML are still snappy and fast.

That 'nothing at all' matters when our current culture involves building layers and layers and mountains of stacks on top of software. That 30 ms compounds, because everyone working on each layer thinks 'hey, this is a bit slow, but it doesn't matter because it's only milliseconds.' Given this, it's obvious that modern software will be as slow and inefficient as it possibly can be before normal people notice and complain (usually only because they've used other equivalent software that's smoother and faster- it's easy to get used to jank, like low FPS in a video game.)

I think that people undervalue the cost of all these layers of abstraction.


I think users have a performance budget and once you are inside that budget, there are rapidly diminishing returns for faster speed.

> it's obvious that modern software will be as slow and inefficient as it possibly can be before normal people notice and complain

Thats sort of circular. If people don't complain there's probably better places to spend engineering effort. Besides, the slower speed is traded off for something: a safer abstraction, faster development, features the users are requesting, better security, etc.

I think some of us on HN have higher expectations for software performance than "normal" users, so we are the first to complain (or use alternate, faster software).


That's because "normal" users don't know any better. They don't know what their hardware is capable of, so they don't know that their software is slow, so they don't complain about it. If they do they usually blame hardware.


Unfortunately, from some of the other comments in this discussion, it seems a lot of developers don't know any better either --- they've never seen what real "fast" is, so they think the amount of latency they experience is normal.


I sympathize to your comment... But at the same time I'm likely to be in the crowd of the unknowings. Could you suggest examples or a way to get an idea how fast computers can be.

I've recently been amazed when cracking an old cryptanalysis puzzle at what Python can do in very little time. But I have no idea what a responsive OS feels like on a general standpoint. Should I try to compile Firefox on Windows 3.1? (Just kidding on this last one, obviously.)


Right. Normal users think things like, "my computer is slow, it must be getting old, time to buy a new one."


That's because normal users simply don't care nor have a reason to care.


Just because an average user isn't able to quite put a finger on their frustration and its source, doesn't mean it isn't there. Studies show that even though users often won't be able to see that it's the performance of the program that is infuriating, they will be more nervous using it anyway and will prefer to use a faster alternative.

When phones with touch screens entered the market, we'd often put up with the latency of touch interaction, but these were irritating nevertheless. Then the early iPhones showed how low the latency could be and how much more pleasant using it is. iPhones degraded in this regard since then and Android phones didn't catch up even to the current iPhones. And I'll never use an Android, one of the main reasons being exactly this: latency.


> I think users have a performance budget and once you are inside that budget, there are rapidly diminishing returns for faster speed.

"Budget" is kind of a strong word. For non-hard-real-time systems, there are various points of diminishing returns on an exponential curve. However, very few interactive programs I've ever used have ever gotten to an acceptable point on that curve, given their features and the performance of my hardware. Android should not take three seconds to drop down the top drawer on a Snapdragon 617. Flutter should not introduce a noticeable latency for typing ASCII characters. Emacs should not take several seconds to display my which-key bindings. Gmail should not take ten seconds to load.

> If people don't complain there's probably better places to spend engineering effort.

For the majority of software, what's actually happening is that people don't realize that "better" is even possible.

> the slower speed is traded off for something: a safer abstraction, faster development, features the users are requesting, better security, etc.

You can make better tradeoffs, and you can make worse tradeoffs. The vast majority of modern software trades off an unacceptably large amount of performance for an unacceptably small amount of all of those things, to the point where those aren't tradeoffs - they're excuses for abysmal engineering discipline.

Additionally, most of those tradeoffs aren't actually necessary. Rust gives you speed, memory safety, and some better security all at once. Common Lisp gives you speed, memory safety, better abstractions, and faster development time. The vast majority of features shouldn't actually have a performance penalty if not being used - for instance, if I don't use the "export to PDF" feature in an office suite, there's absolutely no good reason for it to have any performance impact aside from possibly a few milliseconds of startup time. Finally, the vast majority of software slowdowns that I've experienced have had nothing to do with security at all.

> I think some of us on HN have higher expectations for software performance than "normal" users

Or we're much more aware of what hardware is actually capable of.


> Modern programs are 100 times slower, but they definitely aren't 100 times more functional.

I don't know if this matters. My terminal is fast enough that I've never thought about its latency. If my computer doubled in speed, I'd be happy for all of that to go to a 10% increase in terminal functionality, because otherwise it's getting wasted speeding up a task I can't tell is taking any time at all.


Fine, your computer can render textual characters quickly enough for you. OK.

People do other things with computers. Draw. Make music. Edit music and video. Play twitch games. Get them to understand and respond to speech.

The extra layers of abstraction--specifically, the latency caused by the layers--really hurts these activities.

Saying that today's computers are better than those of the 1980s is very weak tea. They weren't good enough then, and they're still not good enough now.


> People do other things with computers. Draw. Make music. Edit music and video. Play twitch games. Get them to understand and respond to speech.

But these compute-intensive tasks are more often than not implemented in lower-level, highly optimised code like BLAS, CUDA, etc. I think your overall point is valid though -- I'm looking at the Gmail tab in Chrome -- but maybe not for these particular examples.


If your software doubled in speed you could use a processors that needed only a fraction of the power your current processor needs. Computers consume on the order of 1% of all global energy.


> Modern programs are 100 times slower, but they definitely aren't 100 times more functional.

My IntelliJ IDE is easily 100 times more functional than early text editors.

As for slower: Any relative comparisons are still missing the point. It’s not a relative question. It’s an absolute question: Is this fast enough?

Let’s be honest, none of us are sitting around waiting for our complex IDEs to render characters on the screen. No one really cares if it’s 1ms or 30ms or even 200ms.

Likewise, I don’t care if it takes 100ms or 10s to open the IDE because I’m not quitting it and re-launching it all day. I launch it, leave it open, and that’s that.

People like to glorify the good old days when apps were supposedly faster to launch, but they forget the convenience of simply leaving apps open with our oodles of RAM and letting our computers rely on suspend/wake. The longest delays in my workflow are getting my laptop out of the bag and typing in the password, or maybe downloading things from the internet.

Terminal latency or app launch time just don’t even register on the list of delays during my day.


> Let’s be honest, none of us are sitting around waiting for our complex IDEs to render characters on the screen. No one really cares if it’s 1ms or 30ms or even 200ms.

Waiting longer for a character to render to the screen than it takes to send an IP packet to a different continent and receive a reply may make you happy for some reason, but UI research indicates most people will react negatively to this.

It's worse than that. With something like Visual Studio, for example, we are sitting around waiting for local variables to update in the debugger's watch window, if we're experienced enough to know that we need to do that. This kind of thing is a genuine user interface bug: if a person doesn't realize what a slow, awful piece of software they're dealing with, they'll step step step past a variable update in the debugger, never see it happen on their screen because that only works when you pause and give the watch window time to catch up, and assume it's their program that has the problem.

People can get acclimated to dealing with very slow software, but they shouldn't have to when all the hardware performance to make it better already exists.


> No one really cares if it’s 1ms or 30ms or even 200ms.

You would definitely care about 200ms latency.


Have you looked at actual end-to-end latency numbers for common operations? I’m not talking about theoretical transfer times between buffers or carefully structured synthetic benchmarks.

Using fast-twitch PC games as the gold standard, most people are looking at 60-80ms of latency on a local PC. The online game streaming services hover around 150ms, which is noticeable but still entirely usable. (Source: https://www.pcgamer.com/heres-how-stadias-input-lag-compares... )

That’s why I say that 1ms vs 30ms of terminal latency is a non-issue. When I’m typing, I’m not on a tight feedback loop with each character. I know what I’m typing, so I’m not waiting for specific letters to appear on the order of a blink of an eye.

Most of us are using 60Hz monitors (17ms per frame) with terminals that have 20-30ms of lag, with OSes that introduce slightly more delay, and so on. Then we SSH into remote servers with 50ms, or 100ms, or 200ms of latency or more. The total delay between hitting a key and seeing the letter on screen could easily be 200ms on the regular for an SSH session, yet our typing isn’t falling apart.


That PC Gamer article is not an endorsement of the idea you're pushing that all this latency doesn't matter. "Singleplayer games are mostly fine to play through the cloud, but any cloud gaming platform is going to be a no-sell for people who only play multiplayer games, even with a good connection."

I don't want to give that magazine article too much credence, but when it comes to user experience, we as an industry ought to try for more than "mostly fine" or "isn't falling apart."


Typing at 100 wpm is a character about every 120 ms.

A 200 ms delay would be interoperable and a 30 ms delay would be a 25% penalty to see the characters appear.


Stella Pajunas 1946 record of 216 wpm (54,000 strokes per hour, 900 per min, 15 per second, 66 ms) is on average. She must have typed considerably faster some of the time. Anything over 40 ms is probably noticeable.

https://www.pond5.com/stock-footage/item/75268195-miss-stell...

Most wouldn't get near 1/3 of that but when the professionals say your shit doesn't work it does mean something. Trying to increase your speed using a system that cant keep up adds unacceptable annoyance.

I hereby coin the Pajunas rule at 40 ms.


> Let’s be honest, none of us are sitting around waiting for our complex IDEs to render characters on the screen. No one really cares if it’s 1ms or 30ms or even 200ms.

I actually do, because I'd like for IntelliSense and clones to save me some typing, but often that doesn't quite work out, because they're too slow (inconsistently slow, too). I'd also like it if my IDE or Word editor could highlight errors in realtime, but in reality it's more like a couple seconds delayed, so I have to navigate a significant distance back to correct them, instead of being able to fix it almost on the spot.


That’s a different topic, thought. The OP posted about typing latency, not the latency of additional complex features.

It’s not exactly fair to claim that software is “slower” than old software when the software you’re comparing to doesn’t have those features at all.


UI latency is UI latency, and the problem in both cases is due to what markstos mentioned in his initial comment: the increasing internal complexity of the software stack used to develop the software. It's not like modern hardware isn't fast enough to do literally everything that needs to be done locally with minimal UI latency.


I'm guessing 1000x


> Let’s be honest, none of us are sitting around waiting for our complex IDEs to render characters onLet’s be honest, none of us are sitting around waiting for our complex IDEs to render characters on the screen. the screen.

I see you are not much into modern IDEs.

Visual Studio used to be a great perpetrator here. The 2017 version doesn't do it much, but I'm not rolling the dice on the 2019 one as far as I can avoid.

I do sit around waiting for Atom to interpret my keystrokes all the time, but it's never much more than a couple of second.

Oracle's SQL Developer is another one that can't keep-up with the keyboard. It's also not as slow as Visual Studio, but it does break a flow. And it loses characters when it's blocked.


> Let’s be honest, none of us are sitting around waiting for our complex IDEs to render characters on the screen. No one really cares if it’s 1ms or 30ms or even 200ms.

You seem to never worked on ssh connection around the world. Typing faster than your text appears on the screen is annoying.


Or played with a 300 baud modem


> Let’s be honest, none of us are sitting around waiting for our complex IDEs to render characters on the screen. No one really cares if it’s 1ms or 30ms or even 200ms.

I do regularly on my 2019 MBP - I 'only' have the basic SSD and 'only' 16 gigs of memory. Usually Jetbrains stuff, but things freeze up/pause on VSCode and Eclipse as well too.

I'll then be told "oh, must be your hardware", but this is a "feature" I've noticed of... not just JVM software, but probably MacOS in general - sporadic system pauses. SSD makes it more bearable, certainly, but it's not "this" hardware - I've had probably 8 different mac systems since 2008, and they all do this. But... in the 15+ years of Windows and Linux systems before that, similar issues. I see it on other peoples' systems too, but they seem to be able to bite their tongue, or genuinely don't notice (or don't care?)


Are you talking pre 1980’s as early? Because IntelliJ vs EMACS is hardly a 100x improvement.


If you don't care, that is fine. But please don't spoil it for the ones caring. Don't argue against it if you don't care


> My IntelliJ IDE is easily 100 times more functional than early text editors.

Maybe more than text editors, but I doubt it's even twice as functional as eclipse circa 2005. As slow and bloated as it was for it's time it had all the modern features and ran on machines with (I think) 128MB of RAM.


The OOM killer regularly kills IntelliJ on my 16GB RAM Macbook pro when I have the audacity to leave it open while I try to compile something else. That really sucks, so I use vim whenever I'm familiar enough with the code to not need very intelligent autocompletion.


I can definitely feel the difference getting my terminal echoes over 40ms of latency and feel impeded. Not ridiculously impeded, but...


Is the browser able to render all that JavaScript code in parallel on multi-core CPUs?


The market decides how much latency is acceptable.

Computers are made to do work, not stroke some purist programmer ego. Computers and the software they run today enables us to do far more than we used to. If they didn't we wouldn't bother.


> Computers and the software they run today enables us to do far more than we used to.

This is true in the broadest sense -- certainly calc.exe is better than an abacus! -- but I'm not sure it's true on the 10-to-20-year timescale people are usually thinking about when they say software is declining. I use exactly zero features of my computer that were not available in Windows Vista, released in 2006.


Look at Thunderbird since 2006 to see how the product has grown. Sure, we all used "email" in 2006, but email clients were terrible.

* 2007 - message tagging https://website-archive.mozilla.org/www.mozilla.org/thunderb...

* 2009 - search result filtering https://website-archive.mozilla.org/www.mozilla.org/thunderb...

* 2010 - mail account setup wizard https://website-archive.mozilla.org/www.mozilla.org/thunderb...

* 2012(ish?) Support to send large attachments via 3rd party, I.E. box.com https://website-archive.mozilla.org/www.mozilla.org/thunderb...

* 2013 - ignore threads https://website-archive.mozilla.org/www.mozilla.org/thunderb...

* 2014 - attachment reminders https://www.thunderbird.net/en-US/thunderbird/31.0/releaseno...

* 2016 - subject line spell check https://www.thunderbird.net/en-US/thunderbird/45.0/releaseno...

* 2018 - message templates https://www.thunderbird.net/en-US/thunderbird/60.0/releaseno...

* 2019 - mark all folders for account read https://www.thunderbird.net/en-US/thunderbird/68.0/releaseno...


I'm not saying that newer versions of Windows than Vista don't include additional features. Off the top of my head, Windows 10 includes Cortana and Paint 3D. I just don't ever use or care about them, just like how I've never had occasion to "mark all folders for account read".

(And wow, I guess I was right to buy Microsoft, because they were eleven years ahead of the curve on message templates: https://support.microsoft.com/en-us/office/send-an-email-mes... )


Hmm, now I'm curious. What does the software stack you are using in 2020 that hasn't had a new feature (that you use) since 2006 look like?

I can't tell if you are a linux user running a command line mail reader or if you use Win10 to run modern web sites in the browser.


This smacks of capitalist supply theory that is completely reductionist, and doesn't include outside concepts like vendor lock-in, price-tiers etc


Try using a 2008-era computer with a mechanical HDD and you’ll quickly realize that apps did not launch faster back in the day.

I'm still using one! And it's fast enough for what I do with it --- mainly native development in C, but also the usual communications: email, IM, watching videos, reading content-focused web sites (not web apps), etc. Since I don't use anything bloated, 4GB of RAM is more than enough and the HDD remains silent most of the time.

I experience far more delay and frustration using a much newer, ~1-year-old work laptop with much faster CPU and several times the RAM, but with Electron apps or any other "modern crap" that has a fraction of the functionality. The most vivid recent example is the YouTube "new design" --- it is disturbingly slow and dumbed-down. On this new machine I still experience lag often when interacting with bloated JS web apps, which can't even keep up with my typing speed (web-based Outlook is a notable example of this.)

Even presumably-native applications are not immune to this cancerous growth. UWP is another example; the Win10 calculator somehow needs a loading screen, and so does the Settings app, whose featureless pathetic excuse of a UI somehow manages to be less responsive than the classic Control Panel, which opens basically instantly and is just as responsive to input.

Related: https://news.ycombinator.com/item?id=18506170


> the Win10 calculator somehow needs a loading screen

This right here made my day. No further comment on the state of technology today needed.


A really good typist runs around 100 words per minute. 30 millisceonds of delay vs 2 is enough that the terminal won't keep up with the person working on it. And these are TERMINALS, other software has gotten much worse in typing latency.


No one is typing 100WPM in a terminal though. If you're writing a document use a word processor that's optimised for input speed more than your terminal app, or use a faster terminal app with fewer features that slow it down. Leave the terminal apps that can only handle 50WPM alone so the rest of us can have the useful stuff.


I prefer Vim for writing. Plenty of writers prefer text editors to word processors for writing.

No need to be either/or. Fast AND useful is possible.


Are you saying that Microsoft Word has a lower latency than Emacs in a terminal?


I can type prose at ~150WPM pretty much continuously, with bursts of 200+ in short phrases.

I have never measured how fast I type in a terminal, but it's definitely far more bursty, and that's when lag becomes far more unacceptable.


I think you’ve misinterpreted these results. The delays don’t mean that the terminal will start missing keys or won’t keep up with the operator.

Humans don’t type on a tight feedback loop of waiting for characters to pop up on the screen, obviously.

We also need to keep the big picture in mind. If you’re using a 60Hz monitor, it’s only going to update every 17ms anyway. Add buffering and even pixel transition latency to the equation and the end-to-end latencies are even higher.

You’re not going to notice the difference between a terminal editor with 20ms of latency and one with 15ms of latency.


> Humans don’t type on a tight feedback loop of waiting for characters to pop up on the screen, obviously.

Actually, yes, they do. I can't dig up the reference right now, but evidence indicates that for touch typists, error rates rise as input latency increases.

Apparently, seeing your characters appear on screen is a cue the brain uses together with mechanical feedback and other things to do small muscular corrections on the fly.

So there's nothing obvious about a claim to the contrary.


> Actually, yes, they do. I can't dig up the reference right now, but evidence indicates that for touch typists, error rates rise as input latency increases

On what magnitude, though? Can you please share the actual paper?

My claim was that 15ms vs 20ms input latency (actual numbers from the parent comment’s link) won’t make a difference. That shouldn’t be controversial, because all of our standard 60Hz monitors only refresh the frame every 17ms anyway.

For reference, blinking your eyes takes about 100ms. The difference between 15ms and 20ms latency is only 5ms, or literally 1/20th of the blink of an eye.

That difference isn’t going to change your typing error rate.

1 second of input latency will, but that’s orders of magnitude higher than anything discussed here.


Nah, I've written texts in certain Word editors and IDEs and looked up mid-paragraph for spellchecking or re-reading it, just to see the computer still spurting out letters on the screen slower than an half-asleep OM morsing it.

Sometimes these multi-core 4+ GHz fast marvels of nanotechnology simply can't keep up with a meatbag on caffeine punching keys for food. Sad in a poetic way.


That’s an entirely different issue. The original link discussed typing latencies in the range of 1-30ms. That’s a fraction of the blink of an eye.

If your computer is so slow that it takes multiple seconds for letters to appear, something else is happening. Even old “fast” software is going to feel laggy if your system is so bogged down that it can’t keep up with typing inputs on that level.


I'm taking issue with latency in general and it's dismissal. There are reams of ergonomic targets, data, and guides on user interface latency that are just completely ignored these days. A terminal is close to the simplest possible interface whose job is to provide input and output to a user with as little disruption as possible. 30 ms is in fact noticeable.

Here is a short video from Microsoft (about 10 years old now) on the experience of using a touchscreen with 100ms of delay vs 1ms showing the steps in between. The difference is extraordinary for the user even if the end result is the same. https://www.youtube.com/watch?v=vOvQCPLkPt4


> Here is a short video from Microsoft (about 10 years old now) on the experience of using a touchscreen with 100ms of delay vs 1ms showing the steps in between. The difference is extraordinary for the user even if the end result is the same.

That’s additional input delay, not absolute latency.

More importantly, the visual effect you’re seeing isn’t the perception of input delay. It’s the positional difference between the finger location and the box location. Typing in a terminal doesn’t have the same visual difference. In fact, using a mouse doesn’t have the same visual difference because your finger isn’t right next to the screen for comparison. Try dragging a window across your screen right now. The “lag” is easily on the order of the 100ms shown in this demo, but you won’t find it anywhere near as disorienting because your finger isn’t on the screen for visual reference.

It’s not possible to have 1ms lag unless you have a 1000Hz screen and zero processing time for the inputs.

Compare that to the standard 60Hz displays that most laptops and cellular phones use. At 60Hz, the screen only updates once every 17ms. Add input lag, processing lag, rendering delay, buffering delay, and other delays and it’s nearly impossible to go from input to screen action in under 30ms. Most games are on the order of 60-80ms.

The individual pixels in your LCD can’t even toggle from black to white in 1ms.

A lot of the HN comments here are misunderstanding the time scales involved in all of this. For reference, a blink of an eye is about 100ms


> The original link discussed typing latencies in the range of 1-30ms.

30ms is close to extremely annoying. You will notice a 30ms latency in a video game when moving around, turning etc.

Having that kind of latency on mouse movement can make it close to impossible to hit fast targets in a shooter for instance, because it will destroy perceived synchronicity of input with the response, confusing your brain and destroying immersion.

The same can be said about navigating in a terminal based text editor. A few milliseconds of delay can take your brain from "I am the cursor" to "I am controlling this cursor, which respond a short time after I execute a certain action". Frustrating.

What is also super frustrating to me when using vim in a slow terminal is that I will consistently overshoot the line where I meant to place the cursor. I realize most of the latency probably comes from my brain and nervous system, but a few ms can be just enough to make the cursor end up in the next line instead.


> 30ms is close to extremely annoying. You will notice a 30ms latency in a video game when moving around, turning etc.

That’s not true, though. It takes more than 30ms just to render your frame, deliver it to the monitor, and wait for a screen refresh.

PC Gamer recently measured some actual latency results. They found 60-80ms was fairly standard for lag between input and on-screen events: https://www.pcgamer.com/heres-how-stadias-input-lag-compares...

Notably, the online game streaming services have on the order of 150ms of latency. It starts to become noticeable, but it’s still entirely playable for many. If the gamers can handle 150ms of input lag, certainly we can deal with 30ms while typing in our consoles.


> PC Gamer recently measured some actual latency results. They found 60-80ms was fairly standard for lag between input and on-screen events: https://www.pcgamer.com/heres-how-stadias-input-lag-compares....

So, I just made the following experiment. I filmed my screen and mouse at the same time, then grabbed a window and flicked as fast as I could.

On my 120 Hz screen the window starts moving within five to six frames (at 240 frames per second video) of the mouse moving, meaning an end-to-end latency of around 20 ms.

Windows 10, nVidia graphics card with hardware scheduling, Ryzen CPU, using the cheapest 120/144 Hz screen that was available at the time (<200 €).


> That’s not true, though. It takes more than 30ms just to render your frame, deliver it to the monitor, and wait for a screen refresh.

For cheap non-CRT screens you are correct. Otherwise, not so much. And even if you're using a cheap screen, latencies of 50ms vs 30ms (typical numbers for vsync on vs vsync off) definitely have an impact on player performance besides just feeling more/less sluggish. See also my post here: https://news.ycombinator.com/item?id=23806493

> https://www.pcgamer.com/heres-how-stadias-input-lag-compares...

That PC gamer article is horrible and amateurish. There's no mention of what vsync/nvidia driver settings they used etc. Also games like Destiny are pretty far from being optimized for low input lag.

With CRTs and properly optimized/coded games like CS you can achieve input lag as low as 5ms (though that's just the time until the monitor begins drawing the first rows of pixels after the change).

Here's a few links on input lags measurements:

http://www.esreality.com/post/2691945/microsecond-input-lag-...

https://forums.blurbusters.com/viewtopic.php?f=10&t=1381&hil...

http://esreality.com/post/2640619/input-lag-tests-ql-csgo-q3...

People doing A/B tests:

https://forums.blurbusters.com/viewtopic.php?f=10&t=1134

And a demonstration that even input lag as low as 10ms can be noticeable:

https://www.youtube.com/watch?v=vOvQCPLkPt4


30ms is the video refresh latency with a typical 60fps monitor and a double framebuffer, without any latency from the application itself. Most videogame players have a similar setup and they evidently don’t find it “extremely annoying”, ot they would have stopped playing a ling time ago.


>Most videogame players have a similar setup and they evidently don’t find it “extremely annoying”

I do. And it's not just me. There's a reason many players of fast twitchy shooters are playing either without vsync, or vsync + uncapped FPS[1] (+ the newer "low latency" modes of modern drivers, but mileages vary).

This stuff has already been well understood, A/B tested, and documented back in the CS1.6 days. A CS player who hasn't at least put some thought into it or doesn't care is extremely rare. And CS is still topping the steam charts daily at up to a million or more concurrent players.

People have no issues noticing vsync ON vs OFF (well that's easy) or vsync + uncapped vs capped FPS in blind tests. I challenge you to get a friend to perform a blind test with you to try this for yourself.

There's also a reason that reduced input lag is part of the sales pitch of everything from monitors and GPUs to input devices. It's because people do care (irrespective of the effectiveness of what vendors try to sell you). Hardware manufacturers, driver developers, and game studious aren't putting time and money into this because nobody gives a damn.

[1]: vsync + uncapped FPS is called "fast" vsync in nvidia driver settings

Edit: I'm going to give an example of why these numbers matter in fast shooters.

Suppose you pit two players of equal skill against each other, each having a reaction time range of 300ms - 400ms, and whoever reacts first wins. One player is playing with a total input latency (mouse to screen) of 50ms, and the other with a latency of 30ms (these are typical numbers of vsync on vs vsync off).

The player with 20ms less latency is going to win about 70% of the time.


"30 ms here", "one frame of latency there" is exactly how Android (and some PC apps) end up with touch latencies in the triple digits.

FWIW most people who semi-seriously game don't use 60 Hz screens and don't use normal V-Sync. They use 120-155 Hz screens, or increasingly 240 Hz screens and fast v-sync (=driver-managed triple buffering combined with a frame queue length of 0 or 1). They also try to run the game with as high FPS as possible.


I have definitely used software where it started lagging behind significantly while I was typing. That kind of stuff is absolutely infuriating.


That’s a very different problem than the 15-30ms latency discussed in the link above.


> absolutely massive difference in functionality of newer apps

As antepodius already said, this isn't really true.

Slack has a few more features than the chat applications of 15 years ago. I suspect it may also lack a few of their features. It uses vastly more resources.

Visual Studio is a full IDE, with more features than the Electron-based Visual Sudio Code.


> Slack has a few more features than the chat applications of 15 years ago. I suspect it may also lack a few of their features. It uses vastly more resources.

Yet Slack has massively more adoption than any previous chat app. That’s not by coincidence: They made it at least an order of magnitude more useful for their target audience.

Slack is, without a doubt, orders of magnitude more complicated than previous-gen chat apps. It’s not like they could have built it with a handful of engineers operating some IRC servers.

However, I was specifically referring to text editors (responding to the OP) vs modern IDEs. As you said, full IDEs have vastly more features than a simple terminal program or basic text editor. That’s where the order of magnitude more functionality comes from.


> Yet Slack has massively more adoption than any previous chat app

Maybe just, by virtue of there being more internet users these days. But you're underestimating just how ubiquitous IRC was, it was near universal. Sure there was no single app that everyone used, but that was a feature. With slack we've gained a few features but regressed in so many other ways.


I still use a machine like that. They were faster hands down.

Even faster a commodore 64. Flick a switch the os is ready. Load geos now you have a basic windows.


Sorry the opposite is true here . Desktop gui apps from the 90s were much more advanced , with more features and more responsive


At this point I'm not even sure what people mean when they say "slow" or "poor performance", since I have a mid-tier PC and have no problems with VSCode or anything else. Everything starts fast and I have no latency issues when typing. Imperceptible differences in time are not something I would ever consider a performance problem, certainly not in the case of UX.

Same thing with memory usage. If I'm not even using a third my RAM with my browser, editor and a few other random things open I'm not going to complain about something being a "resource hog".


There are two possible reasons for this.

One, you simply may not know better. If you've never experienced the typing latency of an 80s or 90s style system, you may not have a reference point for low latency. Not saying this is necessarily the case for you, but I suspect it's the case for some people.

Second, people genuinely have different perception. The most stark example of this in practice for me was the perception of flicker on CRTs. Many people claimed to just not notice flicker at 60Hz, while for me going back from 100Hz to 60Hz was unbearable for longer amounts of time. I'm sure similar differences exist with our perceptions of latency.


>"few extra seconds of app startup time"

This is atrocious. Barring the cases when the application loads and deciphers some massive dataset upon startup normal application should start instantly or close to that.


My desktop is about 10 years old with an old SSD, but /home is on a btrfs raid NAS on spinning disk (over WIFI). It runs a minimalist WM. I paid about $1000 for it (and a few hundred later for the boot SSD, a new video card, and a 802.11ac pcie card)

It is significantly faster at all mainstream tasks than my late 2018 $2700+ work laptop running $VENDOR_DEFAULT desktop environment.


I would be okay and favour the tradeoff... But I recently decided to eradicate gnome-calculator in favour of xcalc because gnome-calculator (probably due to being packaged as a snap) was taking 5-10 sec to show up whereas xcalc is literally instantaneous.


Quite a lot of the new functionality is "value subtraction": adware or spyware that makes the user experience worse.

The recent fiasco where Facebook accidentally broke startup for all apps using their toolkit is a good example.


VSCode, an Electron app, is my favorite application of all time. It starts up quickly and is incredibly responsive in use.

It's easy to customize, there are loads of community extensions, and it's free. When abstractions allow software to have a flourishing community like that, to me its worth the cost.


Still slower than Sublime Text. I gave VSCode a try and this is the reason I went back to Sublime Text. I already have a ST3 license so I didn't care about "free". Feature wise and community wise they are both very good, with maybe a small advantage for VSCode.

It isn't slow but it has noticeable micro stuttering. Enough to be annoying when coming from an editor that doesn't have this problem.

EDIT: Emacs feels surprisingly slow for such an old software, though not in the same way as VSCode. Vim is fast but I never really got into it.


I was actually a Sublime Text user before VSCode. I'll be the first to admit Sublime Text is faster. VSCode still wins for me in a landslide. Sublime text's performance is just not enough of an advantage over VSCode's feature set and extension marketplace.

VSCode is free as in freedom, which is pretty important to me for a tool that I completely rely on for work.

Edit: I'll also add that I've never found VSCode's performance to be a problem. It's pretty rare for me to find a file that's big enough to make VSCode choke. I suppose it might happen with multi-gigabyte log files. But usually I just use journalctl to browse those.


Easy way to make vscode choke: hold down ctrl+alt+up/down to enable multi-line editing. Let the cursor run up or down the screen for a bit. It lags a lot.

This might be because of the vsvim plugin though, itself the source of a lot of lag.


Yeah that's very odd. I use that feature a lot and have never had it choke. But I also don't use vsvim.


The killer-app feature for me on VSCode, is the ability to auto save over SSH.

I tried other things like Emacs, in Tramp mode, terrible. Gawd, it’s so terrible. It took me several days to figure out how to set it up. The documentation and examples were poor or non-existent. And when I finally got it to work, I was thoroughly underwhelmed.

I looked for something else, and found VSCode had a feature for SSH. I downloaded the module, connected to my SSH server, and I was operational in a few minutes. The difference between this and Emacs, was night and day.

And VSCode had all the other nice features, like code folding, syntax highlighting, and especially, function definition lookup, which makes it so convenient to work with your functions library.


Emacs has all those properties except fast startup and it's like 40 years old. Added layers of abstraction aren't to thank here.



Wow, thanks for that link! Great stuff.


Emacs is just the VSCode of yesteryear - there's a reason people used to joke about "emacs" standing for "emacs makes a computer slow".


Eight megabytes and constantly swapping


Meanwhile I find VSCode to be a slow resource hog. I can only assume your simply so used to terrible applications that your standards have slipped.


It's ok to have a different opinion. But you shouldn't assume your experience invalidates my own.

I've used a large number of text editors. Vim, Sublime Text, TextMate. I loved them all, and they are of course much faster and less resource intensive. I would still pick VSCode over each of them when given the choice.

VSCode strikes a great balance. Ironically I find it far more performant and smoother than Visual Studio itself.


> Ironically I find it far more performant and smoother than Visual Studio itself.

it does so much less that it's like comparing notepad to pycharm.

Your exact use-case may not be impacted too much, which is great - but I know a lot of programmers who use Visual Studio's entire feature set including a lot of custom C# plugins and it's not comparable at all. (Intellisense, and native step-through debugging/disassembly to begin with).

VSCode is a text/code editor that has some nice bolt-ons.

Visual Studio is a fully fledged IDE with advanced feature sets for large shared projects.


>"Meanwhile I find VSCode to be a slow resource hog"

Same here. It does start instantly on my laptop though


I wish I had a reference for this. I should probably go looking for it.

Someone claimed that the speed of software doubles every 18 years. I presume they meant this from a theoretical/algorithmic standpoint, rather than median behavior of software systems.

As computer speed progress slows, we should expect more effort to go into this kind of work. It seems like we are paying a good bit of attention lately to extracting the embarassingly parallel parts of our code, and chipping away at the easy end of the shared state spectrum. Amdahl tells us there is gold to be had but the digging gets harder very quickly, so I'm curious to see what will be next, after the great artists steal more of the lessons of Elixir/Erlang, Rust and friends.


I believe that quote refers to compiler optimizations producing a net doubling of performance every 18 years. In effect, “software” doubles in performance.


Could be.

But then like adding lanes to a highway, having more supply just leads to a disproportionate increase in demand, frittered away in a million dependencies that nickel and dime us to death.


Bill Joy talked about scientific computing software advancing like that. Academic papers and new techniques filter down and the same things can be done faster.


You don't get promoted for removing actions.

And now to drink away that depressing thought.


This is so important. I see so many people farting around in the problems of highly scalable distributed cloud systems with thousands of nodes, not realizing that single-digit QPS per node is insanity, why don't you look there first? Computers are fast.


Having spent a lot of time in highly scalable distributed system land, I can perhaps give you few reasons:

1. Sometimes business needs require non-linear growth quickly. One day your perfectly optimized process now requires an n-squared algo across billions of records. Suddenly your one machine is tiny compared to what it used to be.

2. If you haven't scaled the workload horizontally, it can lead to a rewrite just to begin to distribute the load. When you hit the limits of a single machine, it is often a hard barrier that you cannot easily cross.

3. Distributed systems are distributing 3 main resources: memory, IO, and computation. IO especially quickly becomes a bottleneck on a single system, but memory is also quite thorny because memory management itself can be a bottleneck on a vertically scaled system.

4. People with distributed systems do obsess over single-node QPS. If it takes 5k nodes to do work and you can optimize down to 2k nodes, you are saving a lot of money! However, it isn't that simple and this is where being properly distributed gives you cost leverage. You might find that 5 highly scaled machines are more costly than 50 commodity machines, especially in the cloud ecosystems.

5. Finally, and probably to your point and the GP's points, you kind of have to go with the flow when it comes to the level of abstraction people are writing the code at. It takes increasingly specialized knowledge to optimize a process that will run well on a 100gb process (e.g. virtual machine garbage collection issues). You are knowingly sacrificing efficiency for the nice higher level abstractions.

That said, I would emphasize that the abstractions become a smaller slice of the performance pie when you're dealing with algorithmic complexity. C won't magically make your Python algo O(1).

I don't disagree with your main point btw. I think a well done processing / data pipeline should have distributed systems available and single-node computational scenarios available, because there is an undeniable complexity gain when you reach for a distributed system immediately.


I'd argue that throughput that bad usually means you are doing something stupid and easy to fix, like making a blocking call in your single-threaded server process, doing per request what you should have done at startup, not indexing for your query pattern, etc. It's not like you need multiple person-years of Brendan Gregg-level wizardry to serve QPS in the low hundreds from <= 10 boxes. It should just happen, or else you should have a clear understanding of what about your workload is so intense.


True, I imagined while writing the above that people will be thinking of different use cases (or in this case pathological cases). It is very important to profile your system and rationalize all the things it is doing. I have multiple times, for example, seen production systems accidentally bottlenecked by having the wrong level of logging set, such that the primary cost of the system is trace logging, rather than anything having to do with customer value.

I would argue, however, that it has little to do with the decision of whether or not to distribute your system. I've spent a lot of time dealing, for example, with bottlenecked single instance RDBMS instances that are handling load they shouldn't be handling. (For example those accidental recursive queries that are often a side effect of nice ORM abstractions) I totally agree with you, and have seen it happen, that people who do not understand the performance characteristics of their system can reach for a distributed solution before they've understood what their performance issue was. But I've also seen plenty of situations where they reach for bigger hardware for the same reasons.

I think deciding whether or not to distribute means taking a disciplined approach to projecting the business needs of particular data entities you'll be dealing with. For example, if you have a users table, and it will ultimately store every human in the United States, then, well, that is quite do-able on today's single instance RDBMS systems, and you can project the theoretical growth over time. And if you need read load and HA, then you can go a replication route, or at least look at that first before doing something that reduces the quality of your transaction handling, like sharding. And then once the system is in place, taking a disciplined approach to profiling and quantifying the costs of the different aspects of the system and justifying their business value. For example having run large scale recommender systems, a typical decision might be to degrade the quality of an algorithm if it means saving a tremendous amount of money on processing.


Two things: scaling well in a single node environment necessitates separating concerns such that scaling across nodes won’t suck as much. And, C often cuts down O(n^2) algorithms in python to O(1) because the algorithm is O(1), but Python was doing n^2 memory allocations behind the back.


I think they do this because their priority is decentralization and not speed


No they do this because horizontal scalability is more general. Once you cross the threshold of what your meganode can handle you have to rewrite your code from the bottom up


You can achieve decentralization with a handful of nodes, and you can manage them using simple techniques without inviting the whole large-scale cluster scheduler problem.

What I'm talking about is like 3,000 nodes to serve 15,000 QPS. That seems like a lot of QPS, and so it makes intuitive sense to people that they'd need a big distributed system for it, unless they have a sense of how fast computers actually are.


As this post shows - it really depends on what you're doing :). There are services that can handle 100k QPS on a single machine, no problemo. There are services that can handle as little as 10 QPS per machine. Those are rather extreme cases.

A classic example of a "cheap" service is something like Stack Overflow [1] or Wikipedia: serving content that is mostly static; you just gotta configure your caches well. A classic example of an "expensive" service: a search engine, where so many queries cannot be cached (too rare), or are even completely unique (never seen before [2]).

[1] https://stackexchange.com/performance

[2] https://www.seroundtable.com/google-15-percent-queries-25730...


Stack Overflow caches almost nothing. In fact they have just removed most of their remaining caching[1] with no performance hit. Because, surprise, pulling data from a DB and rendering HTML from it is fast, if you're careful about how you do it. The problem with many developers today is that they don't take this level of care.

[1] https://twitter.com/Nick_Craver/status/1280494336673751044?s...


Right - "those are rather extreme cases." So if you're looking at 5 QPS / node, you should be raising your eyebrows, not automatically reaching for an autoscaling group.


I think people do it because they buy into the hype train of some cloud tech and don't really think about if they need it or not.


Sometimes node count is a function of code organization (micro services) not throughput. And when the instances are collocated (10 on a single host) it’s not like they’re not utilizing the machines firepower. Individual instances might have low throughout but some and in aggregate there could be lots more.


Who do you see doing that?

The places I’ve worked (small/med companies) with lots of instances were always well utilized. People understood well the cost of infrastructure and put effort into minimizing it. But as features and services and customers grow, so do instances.


The empty loop in python was surprisingly slow (68,000,000 iterations per second).

What is it actually doing here? A 3ghz cpu has 3 billion cycles per second. So it's spending an average of 44 cycles to increment an integer and compare???

(also fun fact, python integers are 28 bytes, but it still doesn't really explain the slowness: `import sys; sys.getsizeof(123456)`)


I just tested this on a Linux box I happened to have running, using the "perf" tool. By my measurements, each iteration takes about 104 instructions, of which 23 are conditional branches, and completes in about 31 cycles.

(That's after subtracting about 30 million cycles of startup overhead. Tested with Python 2.7.9 on an Intel i3-4160 processor.)

Remember, Python is a bytecode-interpreted language. Each iteration of the loop involves multiple bytecode operations:

      2           0 SETUP_LOOP              20 (to 23)
                  3 LOAD_GLOBAL              0 (xrange)
                  6 LOAD_FAST                0 (NUMBER)
                  9 CALL_FUNCTION            1
                 12 GET_ITER
            >>   13 FOR_ITER                 6 (to 22)
                 16 STORE_FAST               1 (_)
    
      3          19 JUMP_ABSOLUTE           13
            >>   22 POP_BLOCK
            >>   23 LOAD_CONST               0 (None)
                 26 RETURN_VALUE
Executing each of those instructions means fetching it, jumping to the implementation of the appropriate opcode, and then updating the VM's state -- or, in the case of FOR_ITER, calling the native C function that advances the iterator.

Frankly, it's impressive that it's as fast as it is.


> tested with python 2.7.9

Oof.

https://www.python.org/doc/sunset-python-2/


Python might have changed since this was written, but on my machine, with any number, it finishes in 0.0s!

My understanding was that to access a variable, python does a dictionary lookup to match the name to a memory location, which would explain what happens with an older python.


That happens for global lookups, but methods should use a stack-like lookup for local variables which should be much faster IIRC

(Personally, I really question the extremely-dynamic-by-default design of that generation of languages; is being able to treat globals as a dictionary occasionally worth forcing a key lookup on every single global use?)


Python integers are also not limited in size and you also run the overhead of the interpreter.


> How many times can we download google.com in a second? Exact answer: 4

I was a little suprised by this one. Sure, network connections are are going to be much slower than local operations, but where's the time going? Is it mostly the network latency? Google has a lean webpage, so I assume it has nothing to do with google specifically.


I think those were synchronous requests one after another. 250ms per page load sounds about right.


This also surprised me. A large part of it I'm sure has to do with the fact that I'd assume (given these numbers) urllib2 auto follows redirection. The request is to `http://` so it must make an additional request. Otherwise I'd expect them to be much higher as there would be no 'body' to the responses.


If your average connection has 100 ms (50 ms would be extremely good but doable these days) of round trip latency to Google.com, then you have a hard upper limit of 10 loads per second just to connect to the system. Then you have to move the data, store it in memory, parse it, and run it.


I'd be shocked if my latency to Google were even as high as 50 ms. Though downloading http will take 3 round trips for http and four round trips for https which will add up if the connection isn't kept alive after each download.

Edit: just tested and got these numbers for myself:

7-13ms to ping google.com

50-80ms to curl http://google.com

90-130ms to curl https://google.com

140-160ms to curl http://google.com http://google.com http://google.com http://google.com http://google.com

180-220ms to curl https://google.com https://google.com https://google.com https://google.com https://google.com

So should be able to download something like 20-30 copies of google with keep-alive, and ~10 without.


You're absolutely right and speeds to Google have gotten much faster over time. For what it's worth, I just ran a quick ping test and got 26ms average latency to google.com so I should have checked before I posted. Thanks for keeping me in check.

Still this is dependent on where you are, your ISP, other general factors, and it's still within an order of magnitude which was the point of the "quiz"


Great idea, but this is complicated by having to model the Python interpreter.


Agreed, it would be better to stick to a low level language for this. It's made even worse by the fact that some operations will be internally delegated to C code while others are pure Python.


In the real world, when I type close to 100 wpm on keyhero.com, the highlighting of the current word can't even keep up with a human typist.


I don't have this problem. I did some profiling and it's not doing much of anything expensive. Maybe there's some ad that I blocked or something that degrades performance for you.


Or the fact that you’re not using the same computer.



I'm not sure if you've decided to switch up the usual "past discussion" with "if curious see also", but FWIW the latter suggests to me "here is more material that is tangentially relevant that you might find interesting" while the former is very direct in what it's claiming to link to.


I'm just looking for concise wording that no one will misunderstand as somehow critical of the repost.

The intention is simply link to interesting things that readers might be curious to look at. If the repost were bad we'd mark it [dupe] instead.


“Past discussions“ seems absolutely fine (and much more precise).

Maybe others need to consider if they’re overreacting to two very simple words.


It's a critical problem, there are three key areas to prioritize to continue to deliver computing speed-ups:

- better software - new algorithms - more streamlined hardware

The performance benefits from miniaturization have been so great that, for decades, programmers have been able to prioritize making the writing of code easier rather than making the code itself run faster.

The inefficiency that this tendency introduces has been acceptable, because faster computer chips have always been able to pick up the slack.

Now, If we want to harness the full potential of these technologies, we must change our approach to computing.

The researchers recommend techniques like parallelizing code. The multicore technology has enabled complex tasks to be completed thousands of times faster and in a much more energy-efficient way.

we will have to deliver performance the hard way.

For algorithms, the team suggests a three-pronged approach that includes exploring new problem areas, addressing concerns about how algorithms scale, and tailoring them to better take advantage of modern hardware.

..many others will need to take these issues seriously if they want to stay competitive.

Performance growth will require new tools, programming languages and hardware to facilitate more and better performance engineering

computer scientists being better educated about how we can make software, algorithms and hardware work together, instead of putting them in different silos.


Am I the only one who dislikes all this parallelism hype? For data analysis and batch jobs on large amounts of data it would be great. Other consumer oriented applications should run on single core.

It hurts when people dismiss otherwise perfectly fine languages like OCaml saying "No multicore" as if majority of tasks need some kind of parallelism.. JS and Python don't do multicore well either.


I don't think the explanation for fill_array.c and fill_array_out_of_order.c is correct. Unless you're running on a massive server, you're not getting anywhere near 300MB of cache.

Modern CPUs have optimizations that bypass L1 and L2 cache allocation for a continuous burst of writes without reads, so the result here is main memory write speed, not cache allocation.


Both examples read from the array before writing, no? So they have to read on each iteration.

I don't have a super solid grasp on caching but it seems like his method of out-of-order referencing will still be hitting a valid L1 cache most of the time, so this understates the problem. Am I misunderstanding?


The cartoon picture is that the first example will read everything into cache once, whereas the second example will read everything into cache twice.

Cache lines are typically 64-bytes, so to write a single character to main memory the following things happen (again, a cartoon picture): First read the 64-bytes area that contains the byte of interest so that it is owned by my cache (this is called a RFO, "read-for-ownership"). Second, update the byte of interest. Thirdly (at some point) write the cache-line back to main memory.

In the sequential case, we just read one 64-byte cache line at a time, update those 64 chars, then write the cache line back to main memory.

In the second example, we first update all the even-indexed characters, which still forces us to read in every cache line. Then we loop around and do the odd-indexed characters, at which point we have to read the cache lines all over again (assuming the array is big enough that the whole thing can't fit in cache at once).


Am I misreading the second example's algorithm? Isn't it allocating like this:

1, 2, 4 3, 8 7 6 5, 16 15 14 13 12 11 10 9, ...

And so on?

---

Also this part:

>Cache lines are typically 64-bytes

Right, but I thought when you access an index it caches quite a lot more than 64 bytes from the index. Doesn't it throw a larger chunk of the array onto multiple lines? If that's the case then the first example is making very efficient use of the cache. If the modern CPUs are smart enough to cache backwards and I understand the second example, isn't the second too?


Ok turns out I was way off, it's actually completely broken. Just ran the code, printed j at each index.

>>> main(20)

2 4 8 16 12 4 8 16 12 4 8 16 12 4 8 16 12 4 8 16

It's not even hitting odd indexes. Over half the array will be garbage at the end. I guess that would count as out-of-order though.


Yep, I misread it also: I saw j = 2 * i (which would do evens and then odds when NUMBER is odd, or evens then evens again if NUMBER is even).

For what it really is - powers of 2 mod NUMBER - when NUMBER is large most reads should be out of the cache. So the first example has to read from main memory only every 64th index, and the second example has to read from main memory on almost every read. I think this agrees with what you are saying. This also explains why it is ~5x slower, which seemed too large from my previous understanding.


>So the first example has to read from main memory only every 64th index

Wouldn't it be 8 per cache line (an int is 64 bits, each cache line is 64 bytes)? I'm also assuming it caches a larger chunk of the array across multiple lines. Is that not how it works?

But I think there's a more fundamental issue here, which is that the amount measured, 68 million bytes in a second, is what - 60Mb? Did he just reduce the array size until it completed in a second? Because a very significant chunk of that is going to fit in L3 cache (on an i7 it's 8Mb), so even if you had a good random access algorithm, it would understate the problem because the data is still contiguous.

Which seems kinda dumb to me, since the real-word problem you're likely to run into is when your data is stored non-contiguously because it's scattered across multiple different structs/objects, making it impossible to utilise the cache to a significant degree at all. In that (very common under OO or interpreted languages) situation I'd expect a way more dramatic slowdown.


Very interesting! I think every dev should have a sense of how long loops like these take, if only to have a good starting point for where to optimise code (only if needed - * insert quote on premature optimisation *).


Anyone interested in how much your brain can do in a second? I'm making a mindmap/chart for that in my quest to better understand natural intelligence and how it could be replicated as AI.


The brain is always a fun comparison. It's this crazy super-scalar architecture that can do thousands of trillions of primitive operations per second (petaflops), yet with pretty terrible latencies: propagation through synapses on the order of single digit milliseconds, individual neurons only fire order of 1 time per second, input processing latencies in the hundreds of milliseconds, and primitive "logical" algorithms like "how many numbers can you count in a second?" netting out to "less than 10".

So we end up in this state where what the brain does well computers are still (comparatively) terrible at, and what computers do well the brain is (comparatively) terrible at. We're slowly bridging that divide, with e.g. TPUs focussing on larger volumes of low precision operations happening in parallel, but we've got a long way to go yet.


Something to think about, back in the 80s, when computers were millions of times slower than today, they were still considered so fast that it was worth sacrificing most of the performance by letting people work in an interpreted language BASIC -- and people were still able to be productive and do real work with them.


Millions, no. Thousands, yes.

Keep in mind that those 80s interpreted languages still have less overhead than the HLLs of today.


No, a typical desktop computer today is millions of times faster than a typical 1980s 8-bit computer.

A Commodore 64 ran a 6510 at ~1Mhz (depending on NTSC or PAL). It was a single core CPU with no pipelining or superscaler features and took multiple cycles to complete a single operation putting it's performance in the hundreds of thousands of operations (of any kind) per second.

The Playstation 4, a consumer-grade entertainment device provides around 8TFLOPs of performance.

Just a CPU, like an AMD 3990x is rated at 2.3 million MIPS while the 6502 at 1Mhz is rated at less than .430 MIPS.


Grep bytes surprised me. Why is it the exact same as write to memory? Is reading from memory much faster than writing or something? Even then I'm not sure how it wasn't CPU bound to compare a 4 char string 2 billion times...


Grepping "blah" over a sequence of zeros is definitely going to be faster than regular grepping.


Ohhh it's a sequence of zeroes.. oops I missed that part. Makes more sense now.


On mobile Firefox (at least for me) nothing appears below the intro message. The last line I see is "Made for you by ...".


How is the 'write a byte to disk' loop faster than the empty loop? Maybe xrange() is dominating in the empty loop example.


It's number of bytes, not loop iterations.


Thanks; I should have done more than glance at it.


Right, each iteration writes a chunk of 1,000,000 bytes.


As others have commented, it's bytes per second. But also, they are not flushing the file to disk, so this might really just writing to memory (in form of the OS's page cache).


Because it's writing one megabyte per iteration.


Great idea! However, I found the hash part a bit confusing; why not use hashes per seconds or bytes per second for both of them?


How fast can a single board computer such as the Raspberry Pi compute the solution to a linear programming problem (i.e., a mathematical optimization problem of the form Ax < b) that has, literally, trillions of constraints? Or can it even be done at all what with the large pool of constraints to navigate through? And how is such a prospect and execution compared to the computations done in the middle of the twentieth century which used weaker computers?


For the final one, I'd like to see a comparison showing how many elements can be set for a linked list in one second.


> 342,000,000

> bytes written in one second

High end NVMe ssd can probably do 3x to 10x better than ~300MB/second sequential write.


Indeed. I have 2TB of storage that is only 5 times slower than my 32GB of main memory (4GB/s vs 20GB/s).


"A newer computer won't make your code run 1000x faster :) "

unless it is quantum?


  for (s = i = 0; i < NUMBER; ++i) {
      s += 1;
  }
I'm surprised gcc can't figure out that this is NUMBER*(NUMBER-1) and eliminate the loop entirely.


> I'm surprised gcc can't figure out that this is NUMBER * (NUMBER-1)

Huh? Do you mean NUMBER instead? If it were s += i then that would be NUMBER * (NUMBER-1) / 2.


You're right, I forgot the / 2.


It probably can, at high-enough optimization level.


They used -O2, which produces the straight forward loop. [0]

But with -O3, for NUMBER > 51, is uses some other odd loop. [1]

Seems to be O(n) no matter what though.

[0]: https://godbolt.org/z/czcaYf [1]: https://godbolt.org/z/sP194Y


Clang knows what’s up: https://godbolt.org/z/7zKffz. GCC is adding the numbers four at a time by stuffing 0, 1, 2, 3 in an XMM register, taking a packed addition within the register, and then doing a pairwise packed addition of the register with 4, 4, 4, 4 to get the next four numbers.


gcc and clang both perform this optimization for me at -O3.


Thanks. On godbolt, I see clang do it at -O3. But gcc still has a loop, where the body has add but no mul, although I don't fully understand the body. Where do you see it in gcc?

https://godbolt.org/z/sP194Y


Did you mean s += i; ?


This is a lot of fun! Why the use of Python 2?



Ah thank you


I’m still salty that Python 3 fragmented the community so much. Both sides of the debate are equally shrill.

Hopefully they won’t merge that latest pattern matching proposal. Talk about impossible to backport code...


At this point, I think the Python 2 side is much less shrill simply because they've started dying out.


Python 4 will do a much better job :-)


>If we just run /bin/true, we can do 500 of them in a second, so it looks like running any program has about 1ms of overhead.

This is off by at least three orders of magnitude, at least on my machine.

    $ time for i in $(seq 1000000); do true; done

    real 0m1.049s
    user 0m1.037s
    sys  0m0.019s
This article would be better if it were about computers in general, not about Python. I specifically avoid using Python for anything serious because I find its performance impossible to reason about. Then again, perhaps it's a good thing to have examples that require thinking rather than just reciting the machine's specs like most of these lists I've seen.


Your "true" is probably a shell builtin. Try it with /bin/true instead.


Two tips, if you're using Bash:

1. Use the external "true" command.

2. Use {a..b} to expand sequences of numbers as "seq" adds extra overhead.

    $ time for i in {1..500}; do /bin/true; done

    real    0m0.983s
    user    0m0.650s
    sys     0m0.423s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: