Ever since I started volunteering for Team Liquid, I've been responsible for the server handling all of our websites. Yes, that's singular - TL, Liquipedia, Liquid Dota, Liquid Hearth, Liquid Legends and our team site all run from one server! When TL was created back in 2001, the site was running on shared web hosting combined with some free ISP web space. Things have changed a lot since then - we've made numerous upgrades over the years, some notable ones being for TSL and then to handle the traffic from SC2.
Our last major upgrade was in 2010 when I started working for TL full time right as the SC2 traffic was starting to ramp up. We purchased a dedicated server at Voxel (later bought by Internap) for a great price at the time through some contacts. Over the years we made some minor upgrades, doubling the RAM and adding a larger SSD, but seven years later it's time for something new (and I'm also afraid of the HDDs dying any day now). Here's our new server at OVH which will hopefully be entering production some time this week.
500mbps w/3gbps burst, no data limit DDoS protected
Backup
Daily, proprietary (R1Soft), 200 GB space
Hourly, off-site ZFS snapshots, 2 TB space
Linux
Debian, 3.16.0-4-amd64
Debian, 4.9.0-0.bpo.2-amd64
Location
NYC, United States
Montreal, Canada
From the specs alone, this is a significant upgrade over our current hardware. With the 4th generation Xeon CPUs running at higher clock speeds, webpage generation should be much faster overall, especially on Liquipedia where non-cached pages are CPU bound. The large amount of RAM keeps our entire database resident in memory and allows us to bump up the size of our Varnish storage for cached wiki pages. The increased number of cores enable more simultaneous pages to be rendered; the current server quickly maxes out during big events due to heavy editing traffic on Liquipedia.
Moving away from a proprietary backup system (R1Soft requires binary kernel modules) allows full flexibility with the kernel and I've chosen to use ZFS on Linux along with the latest Debian backported kernel. ZFS is an enterprise copy-on-write file system with excellent data integrity features, compression, snapshots and more. The new kernel also makes technology like Google's BBR TCP congestion control available, which greatly improves data transfer speed over higher latency connections.
The server location has moved from Internap in New York City, USA to OVH in Montreal, Canada. This should have a minimal effect on latency and it allows us to have geo-distributed backups - the current server only backs up to the same datacenter in NYC whereas the new server backs up to a different continent entirely (to a server in France). While OVH do have polarizing reviews, I've been running my personal sites there for a while without any issues and feel confident enough to use them for TL. I have a feeling that a lot of the negative reviews come from people who don't understand that they are a fully unmanaged provider.
I also took the opportunity to move all our background processing tasks such as deferred thread updates, live stream info, spam detection, etc into systemd units, so they are properly managed and restarted if any issues occur. This should finally put things like this issue to rest.
Sounds great! Nice moving the bg processing to systemd services.
Interesting that you mention BBR, I've looked into this recently at uni and it seems to me like people are a bit cautious in the IETF, academia, etc when it comes to adopting BBR. They're not buying Googles results just like that, at least that's my impression. I also heard that BBR took a "shortcut" into the kernel because the Google guys (who also administer the kernel) merged it... Anyway, I myself think BBR is a great solution, but I was wondering if you had some great resources to point me to or any other justification for your "great improvement" on data trasnfer speed? (Thanks!!)
Remember to turn off SELinux for better performance
That's a nice little setup. What monitoring do you run on the server? Do you have the ability to compare the loads now on the new system, with the loads on the old system. Will you publish them?
Though yeah, get off those HDDs before they die I can fully appreciate that. The Intel SSDs are nice, I looked at them but I went mostly to Samsung NVMe drives for my personal server rack because my employer gave me a bunch.
On March 21 2017 07:46 hewo wrote: I also heard that BBR took a "shortcut" into the kernel because the Google guys (who also administer the kernel) merged it...
huh? The linux kernel is maintained by the Linux Foundation, not by Google...
On March 21 2017 07:46 hewo wrote: I also heard that BBR took a "shortcut" into the kernel because the Google guys (who also administer the kernel) merged it...
huh? The linux kernel is maintained by the Linux Foundation, not by Google...
Great news! Very impressive to see how well the website speeds have kept up on the older hardware That is quite a big step up in CPU power.
Is the DB really just on a single drive? Surprised it is not part of a RAID at least or some kind of software fault tolerance. I can see why you would be concerned when a single drive failure would take the DB offline.
I assume for budgetary reasons this is not being setup with two identical hosts and a SAN for shared storage and failover?
I'm not very tech savy, is there anything that improves the experience for the average viewer or is it just more of a technical upgrade just to move along with the times.
On March 21 2017 13:56 Shock710 wrote: I'm not very tech savy, is there anything that improves the experience for the average viewer or is it just more of a technical upgrade just to move along with the times.
Faster and more stable site, better handling viewer spikes and maybe less maintanence downtime.
I always wondered why TL often performed quite slow - but now seeing the old setup, I'm wondering even more why it didn't perform even slower. Looking forward to the performance boost
On March 21 2017 07:13 R1CH wrote: <h2>Our last major upgrade was in 2010 when I started working for TL full time right as the SC2 traffic was starting to ramp up.
Are you preparing for the traffic that will generate BW remastered? Good job and thanks for continuously improving the sites/servers.
On March 21 2017 21:15 RHoudini wrote: 2x Xeon E5-2690v3 @ 2.60 GHz
Hmm, that's slightly outdated. If you're building a new server for the next 5 years, why not use the latest hardware (v4 Xeons)?
We don't have much option here, generally dedicated server providers don't offer the latest hardware until it's already several years old due to compatibility and price. In any case, the performance difference between them for single thread workloads (mostly what we have) actually favors the v3 (http://www.cpubenchmark.net/compare.php?cmp[]=2364&cmp[]=2780).
On March 21 2017 21:15 RHoudini wrote: 2x Xeon E5-2690v3 @ 2.60 GHz
Hmm, that's slightly outdated. If you're building a new server for the next 5 years, why not use the latest hardware (v4 Xeons)?
We don't have much option here, generally dedicated server providers don't offer the latest hardware until it's already several years old due to compatibility and price. In any case, the performance difference between them for single thread workloads (mostly what we have) actually favors the v3 (http://www.cpubenchmark.net/compare.php?cmp[]=2364&cmp[]=2780).
At the same price level a 14-core 2690 v4 should definitely be a better choice than a 12-core 2690 v3. v4 Xeons are marginally faster than v3 at the same clock speed, consume less and have more cores at the same price level. v4 also support higher-clocked memory (2400 MHz vs 2133 MHz). But of course the differences are marginal compared with the giant leap from the current server.
On March 21 2017 07:13 R1CH wrote: <h2>Our last major upgrade was in 2010 when I started working for TL full time right as the SC2 traffic was starting to ramp up.
Are you preparing for the traffic that will generate BW remastered? Good job and thanks for continuously improving the sites/servers.
you'd better be ready for that, when all the newcomers (or returners) come hitting the TL strategy forums and BW Liquipedia
Hm... I have to check this at home later... because at work I have similar problems with the German gaming-news site readmore.de ... could be a connection!
I'm aware of the random timeouts. Something is causing the webserver to get stuck in 100% CPU for some requests, causing all other requests to that worker to time out. Very strange, never seen anything like this before. I'm debugging it, but it may be a while before a fix since it's very random when it happens.
On March 24 2017 21:19 R1CH wrote: I'm aware of the random timeouts. Something is causing the webserver to get stuck in 100% CPU for some requests, causing all other requests to that worker to time out. Very strange, never seen anything like this before. I'm debugging it, but it may be a while before a fix since it's very random when it happens.
On March 24 2017 22:14 R1CH wrote: Just applied a patch to the webserver that I hope will fix the issue.
I wouldn't have thought, that this "while" was not even an hour! Nice work R1CH!