The COVID-19 epidemic has repeatedly affected daily life and work. I have had too many things happen in the past six months. My family members are seriously ill and my grandmother has passed away. I have a lot of things to do, so I have stopped updating my blog for half a year.

Facing the economic downturn, layoffs have begun in all walks of life, and many companies have begun to shrink their business. Taking advantage of my recent leisure time, I will make a brief summary of my previous CDN work experience.

1. Analysis of large-scale website architecture:

In the era of mobile Internet, for a large-scale website with high concurrency, the architecture design will not be too simple. I will briefly summarize it from the following aspects:

1. Client:

Device side mainly includes computer PC, mobile phone, smart device, etc.

The browser layer mainly includes Chrome, Microsoft IE/Edge, Safari, Firefox, Opera, etc. The statistics here are mainly based on the browser kernel.

Protocol layer:

  • (1) Ordinary web pages: Http1.1/Http2/Quic
  • (2) Streaming media: Rtmp/Http-FLV/HLS (m3u8)

2. CDN layer:

For small and medium-sized websites, commercial CDN is generally used

For large and extra-large websites, it is recommended to use self-built and commercial CDN, which I will introduce in detail later.

In the field of CDN, there are two main components, cache layer + scheduling layer.

For different business forms, it is generally divided into static acceleration, dynamic acceleration, download acceleration, audio acceleration, video acceleration, live broadcast acceleration, P2P acceleration, and security acceleration.

3. Static service:

The front-end business forms are generally divided into H5, applet, Android, and IOS, and the main elements are HTML/CSS/JS

Front-end mainstream frameworks, VUE/React/Angular

Generally, the server side is built with Nginx or Openrestry, and SSL certificates, expiration time, cross-domain configuration, etc. can be set.

4. Dynamic services:

Generally based on the company’s main programming language, usually php/java/golang/python, etc.

Currently, most companies will run dynamic services in docker or k8s clusters

For the dynamic level, if it is not possible to separate dynamic and static, CDN must be used. It is recommended to configure the strategy back to the source to reduce routing losses.

5. Data cache layer:

This part mainly uses Redis or Memcached The cloud native ecosystem also uses etcd or tikv for caching.

6. Database server:

Traditional databases mainly include Mysql, MariaDB, PostgreSQL, SqlServer, Oracle, and DB2 relational databases.

For non-relational databases (NoSQL), the main representatives are Mongodb, Hbase, Cassandra (facebook), etc.

As the requirements for performance and massive data become higher and higher, new distributed databases are developing better and better, such as Tidb (PingCAP open source), Greenplum, DorisDB (Baidu open source), Clickhouse, etc.

7. Distributed data storage:

Generally, Ceph/HDFS/GlusterFS/GFS are used to select different storage software according to different types of scenarios.

At present, the mainstream is mainly Ceph and HDFS. Many cloud vendors use Ceph at the bottom, mainly because it supports block storage, object storage, and file system. At the same time, openstack uses this product by default, and its support for k8s is relatively complete; while HDFS mainly meets the needs of massive data storage, and is generally used with hadoop and Spark, and its support for cloud native is not very good.

2. Self-built CDN or commercial CDN?

1. Choose CDN

Whether to consider self-built CDN or commercial CDN, I think it needs to be considered based on the company’s business scale, cost, and scenario.

For CDN acceleration of ordinary static files, if the scale is below 10 million PV, do not consider self-built CDN, commercial CDN can meet the needs, and the scheduling strategy gives priority to CDN with low price and good service effect;

For CDN acceleration of large file downloads, if the company’s bandwidth is not abundant, it is not recommended to build a CDN, and commercial CDN can be used;

For audio and on-demand acceleration, if the R&D team is not strong enough, it is recommended to use commercial CDN, which is recommended to use CDN with low price, many nodes and good effect;

For live broadcast acceleration, this type has high requirements for experience effect, mainly considering the network quality of CDN nodes, and it is recommended to use large cloud CDN manufacturers.

For dynamic acceleration and security acceleration, give priority to CDN manufacturers with excellent back-to-source links and perfect security strategies.

2. How to build a CDN yourself?

For server hardware, consider using a more cost-effective server in terms of hard disk, memory, and CPU;

For computer room bandwidth, operator agents are preferred, and the preferred conditions are: large computer room outlet, excellent link, and hard protection;

For cache nodes, the selection is generally Nginx or Apache traffic server or varnish, squid;

For scheduling systems, third-party intelligent CDN is preferred, and those with certain R&D capabilities can consider Bind View or Bind DLZ;

For CDN high-availability load balancing, consider LVS, Nginx, Keepalived, and Haproxy

3. How to choose a commercial CDN?

For domestic CDN vendors, there are mainly Alibaba Cloud, Tencent Cloud, Wangsu, Huawei Cloud, Baidu Cloud, Kingsoft Cloud, Baishan Cloud, Qiniu Cloud, ChinaCache, Youpai Cloud, Dilian, ucloud, Qingyun, etc.

For overseas CDN vendors, there are mainly AWS cloudfront, Akamai, Google, Fastly, Limelight, Cloudflare, StackPath, CDNetworks, etc.

Whether domestic or overseas, when choosing a CDN, you can refer to the following indicators:

  • (1) Node distribution. If the CDN has more nodes and is closer to the user, the acceleration effect will be better.
  • (2) First screen time. The time when the user opens the page for the first time. The shorter this indicator is, the better the user experience will be.
  • (3) Node delay. The shorter the delay from the user side to the CDN node, the better the network link effect will be.
  • (4) Download speed. If the user downloads files quickly, it means that the server performance and network are good.
  • (5) Packet loss rate. If the packet loss rate on the user side is high, it means that the overall network effect from the user to the CDN node is not good.
  • (6) Return rate, starting from the proportion of return requests and return traffic, the lower the better, which means that CDN helps you to handle the traffic;
  • (7) Hit rate, the higher the hit rate, the more CDN cache resources, and the better the acceleration effect;
  • (8) Whether the platform functions are rich, whether the basic functions are complete, and how is the interface experience;
  • (9) Operation team response speed, how long does it take to respond and process after feedback;
  • (10) Scheduling strategy, observe whether the service provider’s scheduling strategy is reasonable.

III. Summary

In recent years, with the advent of the cloud computing era, the domestic CDN industry has also undergone a major reshuffle. The original CDN giants Wangsu and ChinaCache have now been replaced by Alibaba Cloud and Tencent Cloud; the former Dilian business is getting worse and worse, Kuaiwang was acquired by 21Vianet, Baishan Cloud has transformed into edge cloud computing, and other medium-sized cloud vendors such as Kingsoft Cloud, ucloud, Qiniu Cloud, and Youpai Cloud are gradually dividing this market.

In conclusion, no one knows what the future development direction of the CDN industry will be. According to the current momentum, various manufacturers are vigorously engaged in edge computing. With the iteration and upgrading of 5G, artificial intelligence, and container technology, CDN is not just a technology. It needs to be continuously integrated with other technologies and eventually applied to our lives.