How much do you know about the blog system: Uncovering the unknown knowledge (4)

How much do you know about the blog system: Uncovering the unknown knowledge (4)

Big guy said blog

最后更新 3/8/2022 11:18 PM
汪宇杰博客
预计阅读 14 分钟
分类
share
标签
share

上篇《博客系统知多少:揭秘那些不为人知的学问(三)》介绍了博客协议或标准。本篇终章介绍设计博客系统有哪些知识点。

directory

Due to the long length of the article, this article will be divided into 4 push articles, with the following contents:

  1. The past and present life of "blog"
  2. My Blog Story
  3. Who is the audience for blogs?
  4. Basic functions design points
    • 4.1 Article (Post)
    • 4.2 Comment (Comment)
    • 4.3 Classification (Category)
    • 4.4 Tag
    • 4.5 Archive
    • 4.6 Page (Page)
    • 4.7 subscription
    • 4.8 version control
    • 4.9 Theme and personalization
    • 4.10 Users and permissions
    • 4.11 plug-in
    • 4.12 Handling of pictures and attachments
    • 4.13 Dirty word filtering and review of comments
    • 4.14 static
    • 4.15 notification system
  5. Blog agreements or standards
    • 5.1 RSS
    • 5.2 ATOM
    • 5.3 OPML
    • 5.4 APML
    • 5.5 FOAF
    • 5.6 BlogML
    • 5.7 Open Search
    • 5.8 Pingback
    • 5.9 Trackback
    • 5.10 MetaWeblog
    • 5.11 RSD
    • 5.12 Reader view
  6. What are the knowledge points in designing a blog system
    • 6.1 Do all time zones really use UTC?
    • 6.2 HTML or Markdown
    • 6.3 MVC or SPA
    • 6.4 security
  7. concluding remarks

6.1 |Do all time zones really use UTC?

Using UTC for storage time should be a well-known practice in 2020, and the same is true for the blog system. All time data in my blog is ultimately stored in UTC time. But one special thing about blogs is that they should not convert UTC time to the reader's time zone, but should display the time according to the blogger's time zone.

This is not a technical reason. Even if you display the time according to the reader's time zone, there will be no code explosion. The reason is that the original intention of blogs is to show their personality and allow bloggers to have their own display space on the Internet. Therefore, highlighting the blogger's own attributes is very important, and the blogger's time zone is also one of the attributes that allows readers to understand the blogger. Therefore, authentic blog systems will give a time zone setting option and convert UTC time as a display. This is true for WordPress and my Moonlade blogging system. The blog system does not automatically convert the time in the time zone where the reader is located. It is purely a little-known emotional design, but it must be respected.

(图:Moonglade 按博主设置的时区显示文章发表时间)

So here's the interesting thing: how do search engines understand the time of blog posts? It is best to only tell the search engine the UTC time and not display it to the user. The method is also very simple. You can use the datetime attribute of the HTML5 time tag. After the promotion of the HTML5 standard, search engines prefer to judge the meaning of content by looking at the tag type rather than guessing the meaning based on the content in the tag.

In C#, ToString("u") refers to Universal Sortable Date/Time Pattern.

<time datetime="@Model.PostModel.PubDateUtc.ToString("u")" title="GMT @Model.PostModel.PubDateUtc">@DateTimeResolver.GetDateTimeWithUserTZone(Model.PostModel.PubDateUtc).ToString("MM/dd/yyyy")</time>

For the article in the screenshot just now, the HTML of the time is:

<time datetime="2020-04-29 11:41:02Z" title="GMT 4/29/2020 11:41:02 AM"
  >04/29/2020</time
>

6.2丨 HTML or Markdown

Many technical people like to use Markdown as an editor when writing a blog system. If it is just a technical blog, there is no problem using it yourself. But if you're writing a blogging system for others, remember that not everyone is a programmer, and not everyone likes Markdown.

图 | 网络

In this case, a WSIWYG HTML editor (such as TinyMCE) is a good choice. The HTML editor also supports more advanced typography than Markdown. Moonglade supports both HTML and Markdown editors.

(图:Moonglade 使用的TinyMCE编辑器)

保存文章内容到数据库时,Markdown 格式需要选择原始内容,而非生成的 HTML,因为还需要支持后续编辑。HTML 格式现在也不建议 encoding 存储,毕竟都已经 2020 年了,市面上的主流数据库都可以正确支持各种神奇的 Unicode,比如文章中突然出现个 emoji😂,而如果你使用了 encoding,就会像我的博客一样面临一些福报:https://github.com/EdiWang/Moonglade/issues/280。并且 encoding 和 decoding 的过程会影响性能。我的 Moonglade 博客系统也刚刚完成了去除 encoding 的改造。

6.3丨 MVC or SPA

Many community programmers who write blogs prefer to use SPA architecture to build blogs, but despise using MVC and feel backward. Is this really the case? This question is like why planes don't fly straight lines. Are airlines unable to plan? Regarding this, I once wrote in my previous blog post "Summary of my. NET Core Blog Performance Optimization Experience":

After 2014, with the rise of SPA, frameworks such as Angular have gradually become the mainstream of front-end development. The problem they solve is to improve front-end responsiveness and make Web applications as close as possible to the experience of local native applications. I have also faced questions from many friends: Why don't you write your blog in Angular? Are you not good at it?

图 | 网络

It's actually not that simple. In fact, the main work content of my current position is also writing Angular. My blog used to use Angularjs and Angular2 in the background of the. NET Framework version. After a series of practices, I have shown that using Angular for a content site like my blog does not make much money.

In fact, this is not surprising. Before blindly choosing a framework, we have to pay attention to one prerequisite: the SPA framework is actually aimed at Web applications. Application means emphasizing interaction, that is, like Azure Portal or Outlook Mailbox, the purpose is to develop web pages as applications. At this time, SPA can not only improve the user experience, but also reduce development costs. Why not? However, blogs are content-based websites, not applications. If you want to talk about applications, you can only say that the background management of blogs can be applications. The only interactions at the blog front desk are comments and searches, so SPA is not suitable for this kind of work. This is like going to a vegetable market to buy groceries, but riding a bicycle is more convenient than driving a tank.

There are the same references in Microsoft's official documents on when to choose SPA and traditional websites:

https://docs.microsoft.com/en-us/dotnet/architecture/modern-web-apps-azure/choose-between-traditional-web-and-single-page-apps

Another reason why the blog front desk still uses MVC, please review the beginning of this article,"Who are the readers of the blog?" I have been running the blog for more than ten years. Statistics show that almost all users come from search engines and only click in. Read an article, and then close the webpage. Now think about it carefully, what is one of the biggest problems that SPA solves? Is it possible to improve front-end performance (responsiveness) by only refreshing parts? However, users come from search engines and close web pages after reading an article. Can they really take advantage of the SPA's advantage of only refreshing local areas? Users only read an article. If you use an SPA framework, the user has to load a bunch of files from the framework itself, including navigation, interaction and other functions. 99% of the users won't click anywhere else at all, so you only For the sake of 1% of the users, is it worth loading a huge framework? Has this performance improved or decreased?

Although the MVC framework outputs the complete HTML rendered on the server side every time, since 99% of users close the web page after reading an article, for 99% of users, the resources they need to load are much less than the ones they need to load are a set of SPA, which is faster and more SEO-friendly. SPA is suitable for managing the portal in the background of a blog, rather than the front desk.

6.4丨 Safety

According to years of background monitoring data from operating blogs, the most common attack behavior is a fully automated vulnerability scanning tool. They will request common security omissions such as data.zip, wp-admin.php, git directory, etc., or they will attempt to exploit known vulnerabilities in certain blog systems. The purpose is to control the server by adding malicious code against users (such as ransomware, mining), etc. to your blog page, and some will also turn the server itself into a mining machine.

(图:Azure后台捕获的自动化扫描工具请求)

设计博客系统时,常用的安全对策可参考 OWASP(https://owasp.org/),但同时保留灵活性。例如,加入 JavaScript 的 CSP 时,请考虑正常博客用户可能需要添加三方统计插件(如 Azure Application Insights,国内的 CNZZ 等),请设计一定的黑、白名单或功能开关。

Most designers know to guard against user input, that is, blog readers, and the entry point for input is usually only comment and search functions. But don't forget that bloggers 'input in blog back-office management also needs to be guarded against, because it may not be the bloggers themselves who are operating it. For example, if a blogger's account is stolen, and a hacker points a link in the navigation bar in the background to the hacker's server or a wonderful mechanism that has been prepared on localhost (yes, don't think localhost doesn't work on normal people's computers), then readers will be seriously affected.

图 | 网络

Regarding identity authentication for background login, SSO is preferred for those who can adopt mature SSO. For example, Moonglade supports Azure Active Directory authentication, which can use professional services such as Microsoft to manage authorization and authentication to avoid security issues on accounts as little as possible. If the user does not have an SSO environment, then fallback to local account authentication. Don't think that using a three-party service is less safe than writing it yourself, or that the logic you write will not be hacked if no one knows it. Unless you are the world's top big dog, the system you write will be much more likely to be hacked than a three-party service.

Other attacks are usually initiated by boring programmers from some hostile camps, such as using scripts or tools to continuously request a certain URL of the blog system in an attempt to knock out the server like DDOS. For this boring party, the blog system designer only needs to add a rate limit on the URL endpoint. For real DDOS attacks, only cloud anti-DDOS services or hardware DDOS firewalls can solve them.

最后别忘了 OWASP 里没有的东西,博客的协议也会有设计缺陷,例如 pingback 可以用来 DDOS(https://www.imperva.com/blog/wordpress-security-alert-pingback-ddos/),也能扫描服务器端口(https://www.avsecurity.in/wordpress-xml-rpc-pingback-vulnerability/

concluding remarks

When designing an excellent blog system, every detail is worth considering. These designs cannot be done right from the beginning, but must be discovered and thought about based on long-term blog data. Moreover, markets will change, user behavior will change, standards will be eliminated and invented, so your system needs to evolve with it.

Any seemingly simple system, even if it is so ordinary that it is rotten, has a complete system that cannot be seen behind it. Blogs are like this, and electronic shopping malls, takeout, and financial clearing systems are even more complex. Don't start doing them just based on what you see on the surface. Just like building an airplane, building a paper airplane and a real airplane are definitely not the same thing.

Technicians should not feel that they have to use whatever is popular. Excellent products are not made by piling up fashionable technology. They must first analyze how your users use your products before making the most appropriate choice. You must remember that if you want something to succeed, your ideas should not be limited to the technology itself. Only by learning to analyze the market and user behavior can you choose and apply technology more accurately.

图 | 网络

Thank you to the readers who have read here. If you have any questions or discussions, please leave a message and exchange.

** The next chapter will mainly introduce [What are the knowledge points in designing a blog system] Welcome to pay attention **

汪宇杰

Keep Exploring

延伸阅读

更多文章