I should have been more clear now that I reflect on what I wrote. The thread that the game uses for all in world logic is single threaded. Tasks like networking, audio, and rendering can be split into different threads but that can introduce issues. So for the most part, Minecraft is single threaded. Having a second core to offload networking and other lightweight tasks to can be beneficial and even necessary on larger server. Beyond 2 cores though, I would expect to see little difference in performance with each additional core.
As I understand it, the Minecraft game loop is single threaded. There is not a thread per user or anything like that. Additional threads might be used for supplementary tasks by the OS, but increasing the number of CPU cores will not help your ability to process more players, more ticking entities, or anything like that.
The CPU graphs presented actually seem to show a rather inefficient situation of a single thread being scheduled across multiple cores - ideally you would get better overall performance if a single core was dedicated to the minecraft principal gameloop thread, and was running at 100%.
Everything else stands however. The bytes/persecond coupled with the data requirements of mods are what is going to make scaling hard. Vanilla is one thing, but data hungry mods are difficult to account for especially, as with MFFS, they scale per usage.
By that logic, a basic Core2Duo running at smiler speeds would adequately support my current modpack, yet it does not. Multicore usage increases as demand on the server increases proportionally. This is far less efficient then dedicating individual tasks to individual cores however, still better then forcing a single core to run every server calculation.
If you don't believe me (which appears to be the case) you can test the Minecraft single core myth yourself. Grab VMWare or Virtual Box, set it to a single core and, install your OS of choice. For my tests I installed a 64bit copy of Windows XP I had laying around. I was unable to produce a stable tick rate under a single core. Alternatively, tinker with the processor affinity.
Here are some basic tests having done just that: https://i.imgur.com/A11IhZI.png It does not paint a pretty picture. I could barely type out the TPS command so generating chunks is out of the question.
Two cores: https://i.imgur.com/NvsFtvc.png Only a marginal increase in tickrate. I'm not generating chunks under this load either.
With all four cores activated: https://i.imgur.com/CPzKpJ5.png I've circled a few things. Red circles indicate idle activity. Blue circles indicate chunk generation. The increase is near symmetrical across every core, even a few of the extra threads are being used. A single thread yes, but scaled across multiple cores.
If you're hosting a server, and for what every mysterious reason you wish to leave yourself with less headroom don't let me stop you. In layman's terms, you have two full glasses of liquid, and four empty glasses of equal size to the first two glasses. If you pour everything into a single glass some of the liquid is going to spill over. If you pour everything into two glasses they'll both fill up without spilling however, there's no more room for anymore liquid. What do you propose is going to happen when I come along with two more glasses of liquid? Now, if distributed across the four empty glasses you're left with plenty of room for more liquid. It's a rather juvenile explanation, but it works.
I don't have a machine with sixteen cores or multiple physical processors so I can't test against that platform. So long as you're using a modern OS with an updated version of java it should still scale across the two chips.