An hacker, Debian contributor (Debian maintainer and in process of becoming a Debian Developer) Peace lover, cool and arrogant guy
903 stories
·
107 followers

Serving WebP & AVIF images with Nginx

1 Share

WebP and AVIF are two image formats for the web. They aim to produce smaller files than JPEG and PNG. They both support lossy and lossless compression, as well as alpha transparency. WebP was developed by Google and is a derivative of the VP8 video format.1 It is supported on most browsers. AVIF is using the newer AV1 video format to achieve better results. It is supported by Chromium-based browsers and has experimental support for Firefox.2

Your browser supports WebP and AVIF image formats. Your browser supports none of these image formats. Your browser only supports the WebP image format. Your browser only supports the AVIF image format.

Without JavaScript, I can’t tell what your browser supports.

Converting and optimizing images

For this blog, I am using the following shell snippets to convert and optimize JPEG and PNG images. Skip to the next section if you are only interested in the Nginx setup.

JPEG images

JPEG images are converted to WebP using cwebp.

find media/images -type f -name '*.jpg' -print0 \
  | xargs -0n1 -P$(nproc) -i \
      cwebp -q 84 -af '{}' -o '{}'.webp

They are converted to AVIF using avifenc from libavif:

find media/images -type f -name '*.jpg' -print0 \
  | xargs -0n1 -P$(nproc) -i \
      avifenc --codec aom --yuv 420 --min 20 --max 25 '{}' '{}'.avif

Then, they are optimized using jpegoptim built with Mozilla’s improved JPEG encoder, via Nix. This is one reason I love Nix.

jpegoptim=$(nix-build --no-out-link \
      -E 'with (import <nixpkgs>{}); jpegoptim.override { libjpeg = mozjpeg; }')
find media/images -type f -name '*.jpg' -print0 \
  | sort -z
  | xargs -0n10 -P$(nproc) \
      ${jpegoptim}/bin/jpegoptim --max=84 --all-progressive --strip-all

PNG images

PNG images are down-sampled to 8-bit RGBA-palette using pngquant. The conversion reduces file sizes significantly while being mostly invisible.

find media/images -type f -name '*.png' -print0 \
  | sort -z
  | xargs -0n10 -P$(nproc) \
      pngquant --skip-if-larger --strip \
               --quiet --ext .png --force

Then, they are converted to WebP with cwebp in lossless mode:

find media/images -type f -name '*.png' -print0 \
  | xargs -0n1 -P$(nproc) -i \
      cwebp -z 8 '{}' -o '{}'.webp

No conversion is done to AVIF: lossless compression is not as efficient as pngquant and lossy compression is only marginally better than what I get with WebP.

Keeping only the smallest files

I am only keeping WebP and AVIF images if they are at least 10% smaller than the original format: decoding is usually faster for JPEG and PNG; and JPEG images can be decoded progressively.3

for f in media/images/**/*.{webp,avif}; do
  orig=$(stat --format %s ${f%.*})
  new=$(stat --format %s $f)
  (( orig*0.90 > new )) || rm $f
done

I only keep AVIF images if they are smaller than WebP.

for f in media/images/**/*.avif; do
  [[ -f ${f%.*}.webp ]] || continue
  orig=$(stat --format %s ${f%.*}.webp)
  new=$(stat --format %s $f)
  (( $orig > $new )) || rm $f
done

We can compare how many images are kept when converted to WebP or AVIF:

printf "     %10s %10s %10s\n" Original WebP AVIF
for format in png jpg; do
  printf " ${format:u} %10s %10s %10s\n" \
    $(find media/images -name "*.$format" | wc -l) \
    $(find media/images -name "*.$format.webp" | wc -l) \
    $(find media/images -name "*.$format.avif" | wc -l)
done

AVIF is better than MozJPEG for most JPEG files while WebP beats MozJPEG only for one file out of two:

       Original       WebP       AVIF
 PNG         64         47          0
 JPG         83         40         74

Further reading

I didn’t detail my choices for quality parameters and there is not much science in it. Here are two resources providing more insight on AVIF:

Serving WebP & AVIF with Nginx

To serve WebP and AVIF images, there are two possibilities:

  1. use <picture> to let the browser pick the format it supports, or
  2. use content negotiation to let the server send the best-supported format.

I use the second approach. It relies on inspecting the Accept HTTP header in the request. For Chrome, it looks like this:

Accept: image/avif,image/webp,image/apng,image/*,*/*;q=0.8

I configure Nginx to serve AVIF image, then the WebP image, and fallback to the original JPEG/PNG image depending on what the browser advertises:4

http {
  map $http_accept $webp_suffix {
    default        "";
    "~image/webp"  ".webp";
  }
  map $http_accept $avif_suffix {
    default        "";
    "~image/avif"  ".avif";
  }
}
server {
  # […]
  location ~ ^/images/.*\.(png|jpe?g)$ {
    add_header Vary Accept;
    try_files $uri$avif_suffix$webp_suffix $uri$avif_suffix $uri$webp_suffix $uri =404;
  }
}

For example, let’s suppose the browser requests /images/ont-box-orange@2x.jpg. If it supports WebP but not AVIF, $webp_suffix is set to .webp while $avif_suffix is set to the empty string. The server tries to serve the first existing file in this list:

  • /images/ont-box-orange@2x.jpg.webp
  • /images/ont-box-orange@2x.jpg
  • /images/ont-box-orange@2x.jpg.webp
  • /images/ont-box-orange@2x.jpg

If the browser supports both AVIF and WebP, Nginx walks the following list:

  • /images/ont-box-orange@2x.jpg.webp.avif (it never exists)
  • /images/ont-box-orange@2x.jpg.avif
  • /images/ont-box-orange@2x.jpg.webp
  • /images/ont-box-orange@2x.jpg

Eugene Lazutkin explains in more detail how this works. I have only presented a variation of his setup supporting both WebP and AVIF.


  1. VP8 is only used for lossy compression. Lossless compression is using an unrelated format↩︎

  2. Firefox support was scheduled for Firefox 86 but because of the lack of proper color space support, it is still not enabled by default. ↩︎

  3. Progressive decoding is not planned for WebP but could be implemented using low-quality thumbnail images for AVIF. See this issue for a discussion. ↩︎

  4. The Vary header ensures an intermediary cache (a proxy or a CDN) checks the Accept header before using a cached response. Internet Explorer has trouble with this header and may not be able to cache the resource properly. There is a workaround but Internet Explorer’s market share is now so small that it is pointless to implement it. ↩︎

Read the whole story
copyninja
1315 days ago
reply
India
Share this story
Delete

Actually switching something with the SonOff

1 Share

Getting a working MQTT temperature monitoring setup is neat, but not really what we think of when someone talks about home automation. For that we need some element of control. There are various intelligent light bulb systems out there that are obvious candidates, but I decided I wanted the more simple approach of switching on and off an existing lamp. I ended up buying a pair of Sonoff Basic devices; I’d rather not get into trying to safely switch mains voltages myself. As well as being cheap the Sonoff is based upon an ESP8266, which I already had some experience in hacking around with (I have a long running project to build a clock I’ll eventually finish and post about). Even better, the Sonoff-Tasmota project exists, providing an alternative firmware that has some support for MQTT/TLS. Perfect for my needs!

There’s an experimental OTA upgrade approach to getting a new firmware on the Sonoff, but I went the traditional route of soldering a serial header onto the board and flashing using esptool. Additionally none of the precompiled images have MQTT/TLS enabled, so I needed to build the image myself. Both of these turned out to be the right move, because using the latest release (v5.13.1 at the time) I hit problems with the device rebooting as soon as it got connected to the MQTT broker. The serial console allowed me to see the reboot messages, and as I’d built the image myself it was easy to tweak things in the hope of improving matters. It seems the problem is related to the memory consumption that enabling TLS requires. I went back a few releases until I hit on one that works, with everything else disabled. I also had to nail the Espressif Arduino library version to an earlier one to get a reliable wifi connection - using the latest worked fine when the device was power via USB from my laptop, but not once I hooked it up to the mains.

Once the image is installed on the device (just the normal ESP8266 esptool write_flash 0 sonoff-image.bin approach), start mosquitto_sub up somewhere. Plug the Sonoff in (you CANNOT have the Sonoff plugged into the mains while connected to the serial console, because it’s not fully isolated), and you should see something like the following:

$ mosquitto_sub -h mqtt-host -p 8883 --capath /etc/ssl/certs/ -v -t '#' -u user1 -P foo
tele/sonoff/LWT Online
cmnd/sonoff/POWER (null)
tele/sonoff/INFO1 {"Module":"Sonoff Basic","Version":"5.10.0","FallbackTopic":"DVES_123456","GroupTopic":"sonoffs"}
tele/sonoff/INFO3 {"RestartReason":"Power on"}
stat/sonoff/RESULT {"POWER":"OFF"}
stat/sonoff/POWER OFF
tele/sonoff/STATE {"Time":"2018-05-25T10:09:06","Uptime":0,"Vcc":3.176,"POWER":"OFF","Wifi":{"AP":1,"SSId":"My SSID Is Here","RSSI":100,"APMac":"AA:BB:CC:12:34:56"}}

Each of the Sonoff devices will want a different topic rather than the generic ‘sonoff’, and this can be set via MQTT:

$ mosquitto_pub -h mqtt.o362.us -p 8883 --capath /etc/ssl/certs/ -t 'cmnd/sonoff/topic' -m 'sonoff-snug' -u user1 -P foo

The device will provide details of the switchover via MQTT:

cmnd/sonoff/topic sonoff-snug
tele/sonoff/LWT (null)
stat/sonoff-snug/RESULT {"Topic":"sonoff-snug"}
tele/sonoff-snug/LWT Online
cmnd/sonoff-snug/POWER (null)
tele/sonoff-snug/INFO1 {"Module":"Sonoff Basic","Version":"5.10.0","FallbackTopic":"DVES_123456","GroupTopic":"sonoffs"}
tele/sonoff-snug/INFO3 {"RestartReason":"Software/System restart"}
stat/sonoff-snug/RESULT {"POWER":"OFF"}
stat/sonoff-snug/POWER OFF
tele/sonoff-snug/STATE {"Time":"2018-05-25T10:16:29","Uptime":0,"Vcc":3.103,"POWER":"OFF","Wifi":{"AP":1,"SSId":"My SSID Is Here","RSSI":76,"APMac":"AA:BB:CC:12:34:56"}}

Controlling the device is a matter of sending commands to the cmd/sonoff-snug/power topic - 0 for off, 1 for on. All of the available commands are listed on the Sonoff-Tasmota wiki.

At this point I have a wifi connected mains switch, controllable over MQTT via my internal MQTT broker.

(If you want to build your own Sonoff Tasmota image it’s actually not too bad; the build system is Ardunio style on top of PlatformIO. That means downloading a bunch of bits before you can actually build, but the core is Python based so it can be done as a normal user within a virtualenv. Here’s what I did:

# Make a directory to work in and change to it
mkdir sonoff-ws
cd sonoff-ws
# Build a virtual Python environment and activate it
virtualenv platformio
source platformio/bin/activate
# Install PlatformIO core
pip install -U platformio
# Clone Sonoff Tasmota tree
git clone https://github.com/arendst/Sonoff-Tasmota.git
cd Sonoff-Tasmota
# Checkout known to work release
git checkout v5.10.0
# Only build the sonoff firmware, not all the language variants
sed -i 's/;env_default = sonoff$/env_default = sonoff/' platformio.ini
# Force older version of espressif to get more reliable wifi
sed -i 's/platform = espressif8266$/&@1.5.0/' platformio.ini
# Edit the configuration to taste; essentially comment out all the USE_*
# defines and enable USE_MQTT_TLS
vim sonoff/user_config.h
# Actually build. Downloads a bunch of deps the first time.
platformio run

I’ve put my Sonoff-Tasmota user_config.h up in case it’s of help when trying to get up and going. At some point I need to try the latest version and see if I can disable enough to make it happy with MQTT/TLS, but for now I have an image that does what I need.)

Read the whole story
copyninja
2424 days ago
reply
India
Share this story
Delete

A more privacy-friendy blog

1 Comment

When I started this blog, I embraced some free services, like Disqus or Google Analytics. These services are quite invasive for users’ privacy. Over the years, I have tried to correct this to reach a point where I do not rely on any “privacy-hostile” services.

Analytics🔗

Google Analytics is an ubiquitous solution to get a powerful analytics solution for free. It’s also a great way to provide data about your visitors to Google—also for free. There are self-hosted solutions like Matomo—previously Piwik.

I opted for a simpler solution: no analytics. It also enables me to think that my blog attracts thousands of visitors every day.

Fonts🔗

Google Fonts is a very popular font library and hosting service, which relies on the generic Google Privacy Policy. The google-webfonts-helper service makes it easy to self-host any font from Google Fonts. Moreover, with help from pyftsubset, I include only the characters used in this blog. The font files are lighter and more complete: no problem spelling “Antonín Dvořák”.

Videos🔗

  • Before: YouTube
  • After: self-hosted

Some articles are supported by a video (like “OPL2LPT: an AdLib sound card for the parallel port“). In the past, I was using YouTube, mostly because it was the only free platform with an option to disable ads. Streaming on-demand videos is usually deemed quite difficult. For example, if you just use the <video> tag, you may push a too big video for people with a slow connection. However, it is not that hard, thanks to hls.js, which enables to deliver video sliced in segments available at different bitrates. Users with Java­Script disabled are still delivered with a progressive version of medium quality.

In “Self-hosted videos with HLS”, I explain this approach in more details.

Comments🔗

Disqus is a popular comment solution for static websites. They were recently acquired by Zeta Global, a marketing company and their business model is supported only by advertisements. On the technical side, Disqus also loads several hundred kilobytes of resources. Therefore, many websites load Disqus on demand. That’s what I did. This doesn’t solve the privacy problem and I had the sentiment people were less eager to leave a comment if they had to execute an additional action.

For some time, I thought about implementing my own comment system around Atom feeds. Each page would get its own feed of comments. A piece of Java­Script would turn these feeds into HTML and comments could still be read without Java­Script, thanks to the default rendering provided by browsers. People could also subscribe to these feeds: no need for mail notifications! The feeds would be served as static files and updated on new comments by a small piece of server-side code. Again, this could work without Javascript.

Day Planner by Fowl Language Comics
Fowl Language Comics: Day Planner or the real reason why I didn't code a new comment system.

I still think this is a great idea. But I didn’t feel like developing and maintaining a new comment system. There are several self-hosted alternatives, notably Isso and Commento. Isso is a bit more featureful, with notably an imperfect import from Disqus. Both are struggling with maintainance and are trying to become sustainable with an hosted version. Commento is more privacy-friendly as it doesn’t use cookies at all. However, cookies from Isso are not essential and can be filtered with nginx:

proxy_hide_header Set-Cookie;
proxy_hide_header X-Set-Cookie;
proxy_ignore_headers Set-Cookie;

In Isso, there is currently no mail notifications, but I have added an Atom feed for each comment thread.

Another option would have been to not provide comments anymore. However, I had some great contributions as comments in the past and I also think they can work as some kind of peer review for blog articles: they are a weak guarantee that the content is not totally wrong.

Search engine🔗

An way to provide a search engine for a personal blog is to provide a form for a public search engine, like Google. That’s what I did. I also slapped some Java­Script on top of that to make it look like not Google.

The solution here is easy: switch to DuckDuckGo, which lets you customize a bit the search experience:

<form id="lf-search" action="https://duckduckgo.com/">
  <input type="hidden" name="kf" value="-1">
  <input type="hidden" name="kaf" value="1">
  <input type="hidden" name="k1" value="-1">
  <input type="hidden" name="sites" value="vincent.bernat.im/en">
  <input type="submit" value="">
  <input type="text" name="q" value="" autocomplete="off" aria-label="Search">
</form>

The Java­Script part is also removed as DuckDuckGo doesn’t provide an API. As it is unlikely that more than three people will use the search engine in a year, this seems a good idea to not spend too much time on this non-essential feature.

Newsletter🔗

  • Before: RSS feed
  • After: still RSS feed but also a MailChimp newsletter

Nowadays, RSS feeds are far less popular they were before. I am still baffled as why a technical audience wouldn’t use RSS, but some readers prefer to receive updates by mail.

MailChimp is a common solution to send newsletters. It provides a simple integration with RSS feeds to trigger a mail each time new items are added to the feed. From a privacy point of view, MailChimp seems a good citizen: data collection is mainly limited to the amount needed to operate the service. Privacy-conscious users can still avoid this service and use the RSS feed.

Less Java­Script🔗

  • Before: third-party Java­Script code
  • After: self-hosted Java­Script code

Many privacy-conscious people are disabling Java­Script or using extensions like uMatrix or NoScript. Except for comments, I was using Java­Script only for non-essential stuff:

For mathematical formulae, I have switched from MathJax to KaTeX. The later is faster but also enables server-side rendering: it produces the same output regardless of browser. Therefore, client-side Java­Script is not needed anymore.

For sidenotes, I have turned the Java­Script code doing the transformation into Python code, with pyquery. No more client-side Java­Script for this aspect either.

The remaining code is still here but is self-hosted.

Memento: CSP🔗

The HTTP Content-Security-Policy header controls the resources that a user agent is allowed to load for a given page. It is a safeguard and a memento for the external resources a site will use. Mine is moderately complex and shows what to expect from a privacy point of view:2

Content-Security-Policy:
  default-src 'self' blob:;
  script-src  'self' blob: https://d1g3mdmxf8zbo9.cloudfront.net/js/;
  object-src  'self' https://d1g3mdmxf8zbo9.cloudfront.net/images/;
  img-src     'self' data: https://d1g3mdmxf8zbo9.cloudfront.net/images/;
  frame-src   https://d1g3mdmxf8zbo9.cloudfront.net/images/;
  style-src   'self' 'unsafe-inline' https://d1g3mdmxf8zbo9.cloudfront.net/css/;
  font-src    'self' about: data: https://d1g3mdmxf8zbo9.cloudfront.net/fonts/;
  worker-src  blob:;
  media-src   'self' blob: https://luffy-video.sos-ch-dk-2.exo.io;
  connect-src 'self' https://luffy-video.sos-ch-dk-2.exo.io https://comments.luffy.cx;
  frame-ancestors 'none';
  block-all-mixed-content;

I am quite happy having been able to reach this result. 😊


  1. You may have noticed I am a footnote sicko and use them all the time for pointless stuff. ↩︎

  2. I don’t have issue with using a CDN like CloudFront: it is a paid service and Amazon AWS is not in the business of tracking users. ↩︎

Read the whole story
copyninja
2460 days ago
reply
I did not knew about Isso and Commento. Need to integrate it to my blog
India
Share this story
Delete

Of course it runs NetBSD

1 Comment

“Of course it runs NetBSD”

Read the whole story
copyninja
2472 days ago
reply
what? Linux netbsd?
India
Share this story
Delete

About the privacy of the unlocking procedure for Xiaomi’s Mi 5s plus

1 Comment

My little girl decided the old OnePlus One of my wife had to take a swim in the toilets. So we had to buy a new phone. Since I know how bad standard ROMs are, I looked-up in the LineageOS list of compatible OS, and found out that the Xiaomi’s Mi 5s plus was not too bad, and we bought one. The phone itself looks quite nice, 64 bits, lots of RAM, nice screen, etc. Then I tried the procedure for unlocking…

First, you got to register on Xiaomi’s website, and request for the permission to unlock the deivce. That’s already bad enough: why should I ask for the permission to use the device I own as I am pleased to? Anyway, I did that. The procedure includes receiving an SMS. Again, more bad: why should I give-up such a privacy thing as my phone number? Anyway, I did it, and received the code to activate my website account. Then I started the unlock program in a virtualbox Windows XP VM (yeah right… I wasn’t expecting something better anyway…), and then, the program tells me that I need to add my Xiaomi’s account in the phone. Of course, it then sends a web request to Xiaomi’s server. I’m already not happy with all of this, but that’s not it. After all of these privacy breaches, the unlock APP tells me that I need to wait 72 hours to get my phone to account association to be activated. Since I wont be available in the middle of the week, for me, that means waiting until next week-end to do that. Silly…

Let’s recap. During this unlock procedure, I had to give-up:

  • My phone number (due to the SMS)
  • My phone ID (probably the EMEI was sent)
  • My email address (truth is: I could have given them a temporary email address)
  • Hours of my time understanding and run the stupid procedure

So my advice: if you want an unlocked Android device, do not choose Xiaomi, unless you’re ok to give up all of the above private information.

Read the whole story
copyninja
2490 days ago
reply
Many people have complained about the same.
India
Share this story
Delete

Integration of a Go service with systemd: socket activation

1 Share

In a previous post, I highlighted some useful features of systemd when writing a service in Go, notably to signal readiness and prove liveness. Another interesting bit is socket activation: systemd listens on behalf of the application and, on incoming traffic, starts the service with a copy of the listening socket. Lennart Poettering details in a blog post:

If a service dies, its listening socket stays around, not losing a single message. After a restart of the crashed service it can continue right where it left off. If a service is upgraded we can restart the service while keeping around its sockets, thus ensuring the service is continously responsive. Not a single connection is lost during the upgrade.

This is one solution to get zero-downtime deployment for your application. Another upside is you can run your daemon with less privileges—loosing rights is a difficult task in Go.1

The basics🔗

Let’s take back our nifty 404-only web server:

package main

import (
    "log"
    "net"
    "net/http"
)

func main() {
    listener, err := net.Listen("tcp", ":8081")
    if err != nil {
        log.Panicf("cannot listen: %s", err)
    }
    http.Serve(listener, nil)
}

Here is the socket-activated version, using go-systemd:

package main

import (
    "log"
    "net/http"

    "github.com/coreos/go-systemd/activation"
)

func main() {
    listeners, err := activation.Listeners(true) // ❶
    if err != nil {
        log.Panicf("cannot retrieve listeners: %s", err)
    }
    if len(listeners) != 1 {
        log.Panicf("unexpected number of socket activation (%d != 1)",
            len(listeners))
    }
    http.Serve(listeners[0], nil) // ❷
}

In ❶, we retrieve the listening sockets provided by systemd. In ❷, we use the first one to serve HTTP requests. Let’s test the result with systemd-socket-activate:

$ go build 404.go
$ systemd-socket-activate -l 8000 ./404
Listening on [::]:8000 as 3.

In another terminal, we can make some requests to the service:

$ curl '[::1]':8000
404 page not found
$ curl '[::1]':8000
404 page not found

For a proper integration with systemd, you need two files:

  • a socket unit for the listening socket, and
  • a service unit for the associated service.

We can use the following socket unit, 404.socket:

[Socket]
ListenStream = 8000
BindIPv6Only = both

[Install]
WantedBy = sockets.target

The systemd.socket(5) manual page describes the available options. BindIPv6Only = both is explicitely specified because the default value is distribution-dependent. As for the service unit, we can use the following one, 404.service:

[Unit]
Description = 404 micro-service

[Service]
ExecStart = /usr/bin/404

systemd knows the two files work together because they share the same prefix. Once the files are in /etc/systemd/system, execute systemctl daemon-reload and systemctl start 404.​socket. Your service is ready to accept connections!

Handling of existing connections🔗

Our 404 service has a major shortcoming: existing connections are abruptly killed when the daemon is stopped or restarted. Let’s fix that!

Waiting a few seconds for existing connections🔗

We can include a short grace period for connections to terminate, then kill remaining ones:

// On signal, gracefully shut down the server and wait 5
// seconds for current connections to stop.
done := make(chan struct{})
quit := make(chan os.Signal, 1)
server := &http.Server{}
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)

go func() {
    <-quit
    log.Println("server is shutting down")
    ctx, cancel := context.WithTimeout(context.Background(),
        5*time.Second)
    defer cancel()
    server.SetKeepAlivesEnabled(false)
    if err := server.Shutdown(ctx); err != nil {
        log.Panicf("cannot gracefully shut down the server: %s", err)
    }
    close(done)
}()

// Start accepting connections.
server.Serve(listeners[0])

// Wait for existing connections before exiting.
<-done

Upon reception of a termination signal, the goroutine would resume and schedule a shutdown of the service:

Shutdown() gracefully shuts down the server without interrupting any active connections. Shutdown() works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down.

While restarting, new connections are not accepted: they sit in the listen queue associated to the socket. This queue is bounded and its size can be configured with the Backlog directive in the socket unit. Its default value is 128. You may keep this value, even when your service is expecting to receive many connections by second. When this value is exceeded, incoming connections are silently dropped. The client should automatically retry to connect. On Linux, by default, it will retry 5 times (tcp_syn_retries) in about 3 minutes. This is a nice way to avoid the herd effect you would experience on restart if you increased the listen queue to some high value.

Waiting longer for existing connections🔗

If you want to wait for a very long time for existing connections to stop, you do not want to ignore new connections for several minutes. There is a very simple trick: ask systemd to not kill any process on stop. With KillMode = none, only the stop command is executed and all existing processes are left undisturbed:

[Unit]
Description = slow 404 micro-service

[Service]
ExecStart = /usr/bin/404
ExecStop  = /bin/kill $MAINPID
KillMode  = none

If you restart the service, the current process gracefully shuts down for as long as needed and systemd spawns immediately a new instance ready to serve incoming requests with its own copy of the listening socket. On the other hand, we loose the ability to wait for the service to come to a full stop—either by itself or forcefully after a timeout with SIGKILL.

Waiting longer for existing connections (alternative)🔗

An alternative to the previous solution is to make systemd believe your service died during reload.

done := make(chan struct{})
quit := make(chan os.Signal, 1)
server := &http.Server{}
signal.Notify(quit,
    // for reload:
    syscall.SIGHUP,
    // for stop or full restart:
    syscall.SIGINT, syscall.SIGTERM)
go func() {
    sig := <-quit
    switch sig {
    case syscall.SIGINT, syscall.SIGTERM:
        // Shutdown with a time limit.
        log.Println("server is shutting down")
        ctx, cancel := context.WithTimeout(context.Background(),
            15*time.Second)
        defer cancel()
        server.SetKeepAlivesEnabled(false)
        if err := server.Shutdown(ctx); err != nil {
            log.Panicf("cannot gracefully shut down the server: %s", err)
        }
    case syscall.SIGHUP: // ❶
        // Execute a short-lived process and asks systemd to
        // track it instead of us.
        log.Println("server is reloading")
        pid := daemonizeSleep()
        daemon.SdNotify(false, fmt.Sprintf("MAINPID=%d", pid))
        time.Sleep(time.Second) // Wait a bit for systemd to check the PID

        // Wait without a limit for current connections to stop.
        server.SetKeepAlivesEnabled(false)
        if err := server.Shutdown(context.Background()); err != nil {
            log.Panicf("cannot gracefully shut down the server: %s", err)
        }
    }
    close(done)
}()

// Serve requests with a slow handler.
server.Handler = http.HandlerFunc(
    func(w http.ResponseWriter, r *http.Request) {
        time.Sleep(10 * time.Second)
        http.Error(w, "404 not found", http.StatusNotFound)
    })
server.Serve(listeners[0])

// Wait for all connections to terminate.
<-done
log.Println("server terminated")

The main difference is the handling of the SIGHUP signal in ❶: a short-lived decoy process is spawned and systemd is told to track it. When it dies, systemd will start a new instance. This method is a bit hacky: systemd needs the decoy process to be a child of PID 1 but Go cannot daemonize on its own. Therefore, we leverage a short Python helper, wrapped in a daemonizeSleep() function:2

// daemonizeSleep spawns a daemon process sleeping
// one second and returns its PID.
func daemonizeSleep() uint64 {
    py := `
import os
import time

r, w = os.pipe()
pid1 = os.fork()
if pid1 == 0:
    os.close(r)
    pid2 = os.fork()
    if pid2 == 0:
        for fd in {w, 0, 1, 2}:
            os.close(fd)
        time.sleep(1)
    else:
        os.write(w, str(pid2).encode("ascii"))
        os.close(w)
else:
    os.close(w)
    print(os.read(r, 64).decode("ascii"))
`
    cmd := exec.Command("/usr/bin/python3", "-c", py)
    out, err := cmd.Output()
    if err != nil {
        log.Panicf("cannot execute sleep command: %s", err)
    }
    pid, err := strconv.ParseUint(strings.TrimSpace(string(out)), 10, 64)
    if err != nil {
        log.Panicf("cannot parse PID of sleep command: %s", err)
    }
    return pid
}

During reload, there may be a small period during which both the new and the old processes accept incoming requests. If you don’t want that, you can move the creation of the short-lived process outside the goroutine, after server.Serve(), or implement some synchronization mechanism. There is also a possible race-condition when we tell systemd to track another PID—see PR #7816.

The 404.service unit needs an update:

[Unit]
Description = slow 404 micro-service

[Service]
ExecStart    = /usr/bin/404
ExecReload   = /bin/kill -HUP $MAINPID
Restart      = always
NotifyAccess = main
KillMode     = process

Each additional directive is significant:

  • ExecReload tells how to reload the process—by sending SIGHUP.
  • Restart tells to restart the process if it stops “unexpectedly”, notably on reload.3
  • NotifyAccess specifies which process can send notifications, like a PID change.
  • KillMode tells to only kill the main identified process—others are left untouched.

Zero-downtime deployment?🔗

Zero-downtime deployment is a difficult endeavor on Linux. For example, HAProxy had a long list of hacks until a proper—and complex—solution was implemented in HAproxy 1.8. How do we fare with our simple implementation?

From the kernel point of view, there is a only one socket with a unique listen queue. This socket is associated to several file descriptors: one in systemd and one in the current process. The socket stays alive as long as there is at least one file descriptor. An incoming connection is put by the kernel in the listen queue and can be dequeued from any file descriptor with the accept() syscall. Therefore, this approach actually achieves zero-downtime deployment: no incoming connection is rejected.

By contrast, HAProxy was using several different sockets listening to the same addresses, thanks to the SO_REUSEPORT option.4 Each socket gets its own listening queue and the kernel balances incoming connections between each queue. When a socket gets closed, the content of its queue is lost. If an incoming connection was sitting here, it would receive a reset. An elegant patch for Linux to signal a socket should not receive new connections was rejected. HAProxy 1.8 is now recycling existing sockets to the new processes through an Unix socket.

I hope this post and the previous one show how systemd is a good sidekick for a Go service: readiness, liveness and socket activation are some of the useful features you can get to build a more reliable application.

Addendum: identifying sockets by name🔗

For a given service, systemd can provide several sockets. To identify them, it is possible to name them. Let’s suppose we also want to return 403 error codes from the same service but on a different port. We add an additional socket unit definition, 403.socket, linked to the same 404.service job:

[Socket]
ListenStream = 8001
BindIPv6Only = both
Service      = 404.service

[Install]
WantedBy=sockets.target

Unless overridden with FileDescriptorName, the name of the socket is the name of the unit: 403.socket. Currently, go-systemd does not expose these names yet. However, they can be extracted from the LISTEN_FDNAMES environment variable:

package main

import (
    "log"
    "net/http"
    "os"
    "strings"
    "sync"

    "github.com/coreos/go-systemd/activation"
)

func main() {
    var wg sync.WaitGroup

    // Map socket names to handlers.
    handlers := map[string]http.HandlerFunc{
        "404.socket": http.NotFound,
        "403.socket": func(w http.ResponseWriter, r *http.Request) {
            http.Error(w, "403 forbidden",
                http.StatusForbidden)
        },
    }

    // Get socket names.
    names := strings.Split(os.Getenv("LISTEN_FDNAMES"), ":")

    // Get listening sockets.
    listeners, err := activation.Listeners(true)
    if err != nil {
        log.Panicf("cannot retrieve listeners: %s", err)
    }

    // For each listening socket, spawn a goroutine
    // with the appropriate handler.
    for idx := range names {
        wg.Add(1)
        go func(idx int) {
            defer wg.Done()
            http.Serve(
                listeners[idx],
                handlers[names[idx]])
        }(idx)
    }

    // Wait for all goroutines to terminate.
    wg.Wait()
}

Let’s build the service and run it with systemd-socket-activate:

$ go build 404.go
$ systemd-socket-activate -l 8000 -l 8001 \
>                         --fdname=404.socket:403.socket \
>                         ./404
Listening on [::]:8000 as 3.
Listening on [::]:8001 as 4.

In another console, we can make a request for each endpoint:

$ curl '[::1]':8000
404 page not found
$ curl '[::1]':8001
403 forbidden

  1. Many process characteristics in Linux are attached to threads. Go runtime transparently manages them without much user control. Until recently, this made some features, like setuid() or setns(), unusable. ↩︎

  2. Python is a good candidate: it’s likely to be available on the system, it is low-level enough to easily implement the functionality and, as an interpreted language, it doesn’t require a specific build step. ↩︎

  3. This is not an essential directive as the process is also restarted through socket-activation. ↩︎

  4. This approach is more convenient when reloading since you don’t have to figure out which sockets to reuse and which ones to create from scratch. Moreover, when several processes need to accept connections, using multiple sockets is more scalable as the different processes won’t fight over a shared lock to accept connections. ↩︎

Read the whole story
copyninja
2495 days ago
reply
India
Share this story
Delete
Next Page of Stories