Guaranteed Topic Delivery using ActiveMQ Virtual Destinations

Background

The original JMS spec first arrived in 2001 with JSR 914. At the time several enterprise messaging systems were already widely available however each had their own unique features and mechanisms which required software that wanted to talk on a given messaging bus to be tightly-coupled to the specific messaging system implementation in use.

Given that “tightly-coupled” and “enterprise system” are phrases you generally want to avoid pairing, there was a push to abstract away the implementation-specific system aspects from the messaging client code. To address this, the JMS was developed to define a standardized, consistent programming interface that would work across different “JMS provider” implementations. The JMS spec, by its abstracted nature, could only define the lowest-common-denominator of messaging constructs so that they could apply to (or be adapted for) the widest number of messaging systems. The JMS therefore defined two core messaging delivery mechanisms (also referred to as “destinations”): queues and topics.

  • Queues are used for point-to-point messaging (1-to-1), where a message sent to a given queue will only be delivered to a single consumer
  • Topics are used for pub/sub messaging (1-to-many), where a message sent to a topic will be delivered to all subscribers currently listening to that topic.

Beyond the producer/consumer topology differences between the two, there is another key aspect to consider: time dependency.

Time Dependency

A message sent to a queue will remain on that queue until a consumer removes it. In other words, a queue consumer can be brought online and all messages that were sent to the queue prior to the consumer coming online will be delivered to it.

However, as hinted above, subscribers need to have an active subscription at the time a message is published in order to receive it. Any messages published when the subscriber is not listening will not be delivered to said subscriber. There is a slight exception to this rule, durable subscriptions, which allow subscribers to receive all messages between the time they subscribe until they unsubscribe, even if the subscriber was offline when a message was sent.

There are practical limitations to durable subscriptions which reduces the number of use-cases where they would be sufficient. One of the biggest constraints is that durable subscriptions don’t cover messages that were sent prior to the subscriber creating the durable subscription. For example if the sending system starts broadcasting messages at 7:00am and the recieving system registers a durable listener at 7:30am, all messages produced between 7:00a-7:30a will not be delivered to the subscriber. Although there are some situations where this type of operation is sufficient, the more common requirement is that the recieving system get ALL messages from the sending system, whether the recieving system was alive or not. What is needed is a pub/sub model but with the delivery guarentees that queues provide. This is where ActiveMQ virtual destinations can help.

Virtual Destinations

ActiveMQ Virtual Destinations are a way of mapping logical destinations to one or more actual destinations. There are a couple types of virtual destinations available but the one we will focus on is referred to as a Composite Topic. This particular virtual destination will behave as a topic destination for publishers, but we can map that topic to a series of queues (and/or topics) in order to achieve the delivery characteristics that we are shooting for.

Side note: If the termonology hasn’t been confusing enough thus far, I did not use the phrase “virtual topic” above which would have seem to have been a better word choice because “virtual topic” denotes a specific kind of ActiveMQ virtual destination! Virtual topics are similar to composite topics except that virtual topics are simplified (dare I say opinionated) virtual destinations that can be dynamically created by messaging clients based on a naming convention (ie ActiveMQ topic “VirtualTopic.FOO” will create a dynamic, virtual topic).

Composite Topics

Composite Topics are set up in the ActiveMQ broker configuration and allow for flexibility in designing messaging architecture.

Both these aspects are key: we want the flexability to map a single topic to multiple queues, and we need to define this setup in the broker’s initialization file so that the broker knows to create the physical queues on startup so it can begin dropping messages sent to the topic to them.

This is what it looks like inside the conf/activemq.xml:

<broker ...>
...
    <!-- Optional: Pre-define topic/queues -->
    <destinations>
       <topic physicalName="myLogicalTopic" />
       <queue physicalName="queueA" />
       <queue physicalName="queueB" />
    </destinations>

    <destinationInterceptors>
        <virtualDestinationInterceptor>
            <virtualDestinations>
                <compositeTopic name="myLogicalTopic">
                    <forwardTo>
                        <queue physicalName="queueA"/>
                        <queue physicalName="queueB"/>
                        <queue physicalName="queueC"/> <!-- This queue was not defined above to 
                                                            demonstrate that ActiveMQ will create 
                                                            a target destination on the first message -->
                    </forwardTo>
                </compositeTopic>
            </virtualDestinations>
        </virtualDestinationInterceptor>
    </destinationInterceptors>
</broker>

Producer/Consumer Setup

With the broker squared away, the only thing left is to add the appropriate code to the producer(s) and consumer(s). Fortunately since the broker is doing the heavy lifting of forwarding messages sent to the logical topic behind the scenes, we can just set up producer(s) to simply publish to the topic and each consumer to connect to their assigned queue. From the messaging client perspective there is nothing special or exotic that needs to be configured.

Trade-offs

Astute readers may have already guessed that there is some trade-offs involved going this route. The most obvious one is that in the way we’ve configured it, each receiving system would require its own queue, effectively losing the ability for subscribers to dynamically register. In practice this isn’t much of an issue since a) normally the number of receiving systems is known and limited and b) we could have easily added a physical topic to the mapping so that messages would also be forwarded to a standard topic. In addition, placing messages in queues rather than say durable subscriptions makes the operation much more visible and easy to work with and even promotes receiving systems’ ability to implement load-balancing.

Conclusion

Using virtual destinations the producer can enjoy the simplicity of a pub/sub model with the assurance that messages will be waiting for consumers to pull them from the broker when they are up and ready. Though any robust messaging architecture will require more design, planning, and configuration than we’ve touched on, virtual destinations are a tool that one should be aware of that may greatly simplify some otherwise very difficult use-cases.

3-way Disk Mirrors With ZFSOnLinux

Background

ZFS  is a member of the newer generation of filesystems that include advanced features beyond simple file storage. Its capabilities are quite extensive covering a wide range of pain points hit with previous filesystems. The Wikipedia page details them all nicely but for the purpose of this post we will be focusing on its ability to create N-Way sets of disk mirrors.

Traditionally mirrored disk sets in Linux and other operating systems have been limited to two devices (note: devices in this context could be disks, partitions or even other raid groups, such is the case in raid 10 setups). While mirroring has the benefit over other raid levels in that each mirrored device contains a complete copy of the data, the two device limit became inadequate as disk sizes ballooned.  In the age of multi-TB drives, simply rebuilding a degraded mirrored array could actually cause the surviving device to fail, eliminating the very redundancy one was expecting.

ZFS addresses this particular problem in several ways through data checksums, self-healing and smart resilvering instead of blindly rebuilding full array members even if only 1% of disk space is being used.

And to top it off, ZFS also includes the ability to specify N number of devices in a mirrored set.  In this post we will create a sample 3-way mirrored set using loopback devices and run a series of test scenarios against it.

For those unfamiliar, a loopback device allows you to expose an file as a block device. Using loopback devices we can create file-based “disks” that we can use as mirror array members in our test.

Testbed Setup

For this exercise I am using a fresh Debian Jessie (8.1) x86_64 vanilla system installed into a KVM/QEMU virtual machine. The kernel currently shipped with Jessie is 3.16.0-4-amd64 and the ZFSOnLinux package currently available for Debian is 0.6.4-1.2-1.

It should be especially noted that ZFS should only be used on 64-bit hosts.

Installation

Following the Debian instructions on the ZFSOnLinux website,  the following commands were run:

$ su -
# apt-get install lsb-release
# wget http://archive.zfsonlinux.org/debian/pool/main/z/zfsonlinux/zfsonlinux_6_all.deb
# dpkg -i zfsonlinux_6_all.deb
# apt-get update
# apt-get install debian-zfs

This will add /etc/apt/sources.list.d/zfsonlinux.list, install the software and dependencies, then proceed to build the ZFS/SPL kernel modules.

Preparing the loopback devices

Finding the first available loopback device

# losetup -a

If you see anything listed, change 1 2 3 in the commands below to the start with the next available number and increment appropriately.

Creating the files

# for i in 1 2 3; do dd if=/dev/zero of=/tmp/zfsdisk_$i bs=1M count=250; done
250+0 records in
250+0 records out
262144000 bytes (262 MB) copied, 0.371318 s, 706 MB/s
250+0 records in
250+0 records out
262144000 bytes (262 MB) copied, 0.614396 s, 427 MB/s
250+0 records in
250+0 records out
262144000 bytes (262 MB) copied, 0.824889 s, 318 MB/s

Setup the loopback mappings

# for i in 1 2 3; do losetup /dev/loop$i /tmp/zfsdisk_$i; done

Verify the mappings

# losetup -a
/dev/loop1: [65025]:399320 (/tmp/zfsdisk_1)
/dev/loop2: [65025]:399323 (/tmp/zfsdisk_2)
/dev/loop3: [65025]:399324 (/tmp/zfsdisk_3)

Create the ZFS 3-Way Mirror

# zpool \
    create \
    -o ashift=12 \
    -m /mnt/zfs/mymirror \
    mymirror \
    mirror \
    /dev/loop1 \
    /dev/loop2 \
    /dev/loop3

A couple things to note:

  1.  -o ashift=12
    This tells ZFS to align along 4KB sectors. It is generally a good idea to always set this option since modern disks use 4KB sectors and once a pool has been created with a given sector size it cannot be changed later. The net result is that if you created a pool with 512b sectors lets say using 1TB drives, you couldn’t later change the sector size to 4KB when adding 3TB drives (resulting in abysmal performance on the newer drives). So as a rule of thumb, always set -o ashift=12.
  2.  -m /mnt/zfs/mymirror
    This indicates where this pool should be mounted.
  3.  /dev/loopN
    The devices that make up the mirrored set. If these were physical disks you would likely want to use the appropriate disk symlinks under /dev/disk/by-id/.

Verify The ZFS Pool

# zpool list
NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
mymirror   244M   408K   244M         -     0%     0%  1.00x  ONLINE  -
# zpool status
  pool: mymirror
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    mymirror    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   ONLINE       0     0     0
        loop3   ONLINE       0     0     0

errors: No known data errors

Poking The Bear

So now that we have our test 3-way mirror running, lets test the resiliency.

!!! WARNING NOTE: ALTHOUGH ZFS IS BUILT TO RECOVER FROM ERRORS, ONLY RUN THE FOLLOWING COMMANDS IN A TEST ENVIRONMENT OTHERWISE YOU WILL SUFFER DATA LOSS!!!

Setting The Stage

Create random file that takes up ~50% of disk space:

# dd if=/dev/urandom of=/mnt/zfs/mymirror/test.dat bs=1M count=125
 125+0 records in
 125+0 records out
 131072000 bytes (131 MB) copied, 16.8152 s, 7.8 MB/s
# zpool list
 NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
 mymirror   244M   126M   118M         -    20%    51%  1.00x  ONLINE  -
# zpool scrub mymirror
# zpool status
 pool: mymirror
 state: ONLINE
 scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jul 19 20:20:12 2015
 config:

NAME        STATE     READ WRITE CKSUM
 mymirror    ONLINE       0     0     0
   mirror-0  ONLINE       0     0     0
     loop1   ONLINE       0     0     0
     loop2   ONLINE       0     0     0
     loop3   ONLINE       0     0     0

Complete Corruption Of A Single Disk

Wipe disk with all ones (to differentiate from the initialization above from /dev/zero to demonstrate how ZFS resilvers)

# dd if=/dev/zero bs=1M count=250 | tr '\000' '\001' > /tmp/zfsdisk_3
 250+0 records in
 250+0 records out
 262144000 bytes (262 MB) copied, 0.708197 s, 370 MB/s

This will wipe out the ZFS disk label among everything else, simulating the state where a disk is alive but corrupt.

# zpool scrub mymirror
# zpool list
 NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
 mymirror   244M   127M   117M         -    21%    51%  1.00x  ONLINE  -

# zpool status
 pool: mymirror
 state: ONLINE
 status: One or more devices could not be used because the label is missing or
 invalid.  Sufficient replicas exist for the pool to continue
 functioning in a degraded state.
 action: Replace the device using 'zpool replace'.
 see: http://zfsonlinux.org/msg/ZFS-8000-4J
 scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jul 19 20:39:45 2015
 config:

NAME        STATE     READ WRITE CKSUM
 mymirror    ONLINE       0     0     0
   mirror-0  ONLINE       0     0     0
     loop1   ONLINE       0     0     0
     loop2   ONLINE       0     0     0
     loop3   UNAVAIL      0     0     0  corrupted data

errors: No known data errors

Replacing the disk:

zpool replace -o ashift=12 mymirror loop3
# zpool status
 pool: mymirror
 state: ONLINE
 scan: resilvered 126M in 0h0m with 0 errors on Sun Jul 19 20:42:51 2015
 config:

NAME        STATE     READ WRITE CKSUM
 mymirror    ONLINE       0     0     0
   mirror-0  ONLINE       0     0     0
     loop1   ONLINE       0     0     0
     loop2   ONLINE       0     0     0
     loop3   ONLINE       0     0     0

Note that only 126MB needed to be resilvered. ZFS will only synchronize blocks in use, not empty blocks and not blocks that are the same in the new drive (this is demonstrated as we corrupted it with all ones).

Complete Corruption Of 2 Out Of 3 Disks

Check the file first:

# md5sum /mnt/zfs/mymirror/test.dat 
c253c4c5421d793f4fefe34af5a5ecc1  /mnt/zfs/mymirror/test.dat

Corrupt disk 2 and 3:

# dd if=/dev/zero bs=1M count=250 | tr '\000' '\001' > /tmp/zfsdisk_2
250+0 records in
250+0 records out
262144000 bytes (262 MB) copied, 0.660485 s, 397 MB/s
# dd if=/dev/zero bs=1M count=250 | tr '\000' '\001' > /tmp/zfsdisk_3
250+0 records in
250+0 records out
262144000 bytes (262 MB) copied, 0.718505 s, 365 MB/s
# zpool scrub mymirror
# zpool status
  pool: mymirror
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jul 19 22:39:05 2015
config:

    NAME        STATE     READ WRITE CKSUM
    mymirror    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   UNAVAIL      0     0     0  corrupted data
        loop3   UNAVAIL      0     0     0  corrupted data

errors: No known data errors
# md5sum /mnt/zfs/mymirror/test.dat 
c253c4c5421d793f4fefe34af5a5ecc1  /mnt/zfs/mymirror/test.dat

File still looks good. Now replace both drives (done in the following way so we can see it in progress)

# zpool replace -o ashift=12 mymirror loop2 & \
  zpool replace -o ashift=12 mymirror loop3 & \
  sleep 1 && \
    zpool status &

state: ONLINE
 scan: resilvered 127M in 0h0m with 0 errors on Sun Jul 19 22:45:17 2015
config:

NAME STATE READ WRITE CKSUM
 mymirror ONLINE 0 0 0
 mirror-0 ONLINE 0 0 0
    loop1 ONLINE 0 0 0
    replacing-1 UNAVAIL 0 0 0
      old UNAVAIL 0 0 0 corrupted data
      loop2 ONLINE 0 0 0
    replacing-2 UNAVAIL 0 0 0
      old UNAVAIL 0 0 0 corrupted data
      loop3 ONLINE 0 0 0

errors: No known data errors

And finally replaced

# zpool status
  pool: mymirror
 state: ONLINE
  scan: resilvered 127M in 0h0m with 0 errors on Sun Jul 19 22:45:17 2015
config:

    NAME        STATE     READ WRITE CKSUM
    mymirror    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   ONLINE       0     0     0
        loop3   ONLINE       0     0     0

errors: No known data errors

And finally check the file:

# md5sum /mnt/zfs/mymirror/test.dat 
c253c4c5421d793f4fefe34af5a5ecc1  /mnt/zfs/mymirror/test.dat

Corrupting A File

In this test we’ll inject the file on the drive with bad data using the zinject testing tool included with ZFS.

# zinject -t data -f 1 /mnt/zfs/mymirror/test.dat
Added handler 5 with the following properties:
  pool: mymirror
objset: 21
object: 24
  type: 0
 level: 0
 range: all
# zpool scrub mymirror
# zpool status
  pool: mymirror
 state: ONLINE
  scan: scrub in progress since Sun Jul 19 21:54:23 2015
    88.4M scanned out of 127M at 3.84M/s, 0h0m to go
    2.12M repaired, 69.51% done
config:

    NAME        STATE     READ WRITE CKSUM
    mymirror    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        loop1   ONLINE       0     0     0  (repairing)
        loop2   ONLINE       0     0     0  (repairing)
        loop3   ONLINE       0     0     0  (repairing)

Found bad data and in the process of repairing.

# zpool status
  pool: mymirror
 state: ONLINE
  scan: scrub repaired 3M in 0h0m with 0 errors on Sun Jul 19 21:54:55 2015
config:

    NAME        STATE     READ WRITE CKSUM
    mymirror    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   ONLINE       0     0     0
        loop3   ONLINE       0     0     0

errors: No known data errors

Finished reparing 3M of bad data.

Cleanup: If you are testing this yourself, remember to cancel the zinject handler after

# zinject 
 ID  POOL             OBJSET  OBJECT  TYPE      LVL  RANGE          
---  ---------------  ------  ------  --------  ---  ---------------
  5  mymirror         21      24      -           0  all
# zinject -c 5
removed handler 5

Partial Drive Corruption

Inject random bytes into one of the files backing a loopback device (mirrored array member) with dd

# dd if=/dev/urandom of=/tmp/zfsdisk_3 bs=1K count=10 seek=200000
10+0 records in
10+0 records out
10240 bytes (10 kB) copied, 0.00324266 s, 3.2 MB/s
# zpool scrub mymirror
# zpool status
  pool: mymirror
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub in progress since Sun Jul 19 22:08:26 2015
    127M scanned out of 127M at 31.8M/s, 0h0m to go
    24.8M repaired, 99.91% done
config:

    NAME        STATE     READ WRITE CKSUM
    mymirror    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   ONLINE       0     0     0
        loop3   ONLINE       0     0   260  (repairing)

errors: No known data errors

Found corruption and is fixing.

# zpool status
  pool: mymirror
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 24.8M in 0h0m with 0 errors on Sun Jul 19 22:08:30 2015
config:

    NAME        STATE     READ WRITE CKSUM
    mymirror    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        loop1   ONLINE       0     0     0
        loop2   ONLINE       0     0     0
        loop3   ONLINE       0     0   260

errors: No known data errors

24M of drive corruption fixed

Conclusion

Setting up 3-way disk arrays using ZFS provides robust error-detection and recovery from a wide variety of damage scenarios. Its ability to target healing to only the affected data allows it to resilver efficiently and recover faster than traditional array configurations.

 

PostgreSQL Upserts

Background

SQL upserts are a combination of an INSERT and/or UPDATE into a single database operation which allows rows to to be added or modified in an atomic, concurrent-safe way.

For a time upserts were somewhat of a sensitive subject in PostgreSQL circles. Several other DBMS’s had some form of support for upserts/conflict resolutions (albeit often with limited assurances) and refugees from these other DBMS’s often got hung up when porting existing queries that heavily relied on this type of functionality.

Prior to Postgresql 9.5 there were various (and less-than-ideal) approaches to get around the lack of native support. The PostgreSQL manual provided a stored procedure example which:

  • attempts to update an existing row
  • detects if no row was updated
  • attempts to insert new row instead
  • catches and throws away duplicate key exceptions
  • repeat the steps above indefinitely until either a row has been updated or inserted

Though this approach works as intended (and is more or less applicable to other DBMS’s), it has a certain smell to it. Ignoring the performance aspects for a moment, one of the bigger practical pain points of this approach was that each table that you might want to upsert would need to have its own corresponding stored procedure.

However thanks to the work of dedicated PostgreSQL developers, native support finally made into the software starting with the 9.5 release. The INSERT statement was expanded with an ON CONFLICT clause that gives it the ability to recover from unique constraint violations.

Demo

Test system setup

If you don’t already have a PostgreSQL testbed, you can use the instruction listed here to quickly start up a PostgreSQL docker.

Create test table

CREATE TABLE upsert_test1 (
     name       TEXT PRIMARY KEY,
     fav_color  TEXT
);

Insert row using upsert

INSERT INTO upsert_test1 (name, fav_color) 
                  VALUES ('Sally', 'blue')
         ON CONFLICT (name) 
            DO UPDATE SET fav_color = 'blue';

Confirm row values


SELECT * FROM upsert_test1;

 name  | fav_color 
-------+-----------
 Sally | blue

Update row using same upsert statement

INSERT INTO upsert_test1 (name, fav_color) 
                  VALUES ('Sally', 'green')
         ON CONFLICT (name) 
            DO UPDATE SET fav_color = 'green';

Confirm row values


SELECT * FROM upsert_test1;

 name  | fav_color 
-------+-----------
 Sally | green

Conclusion

As you can see above the same upsert statement was able to insert a new row or update an existing row as the circumstances dictated. Though you shouldn’t use upserts recklessly as there will still be a performance hit, the additional flexibility it provides makes migrating from other databases much easier.

Setting Up A PostgreSQL Docker Sandbox

Motivation

Sandboxes are very handy for isolated testing and demonstration purposes. As it happens, I began to write a post that was going to use a PostgreSQL Docker container as a sandbox and I decided it was better to break it out into its own post that could be referred to in future posts rather than add complexity to a post where it wasn’t the main focus.

Install Docker

For my docker host system I am using Debian 9 (Stretch). Installation was done following the official Get Docker CE for Debian instructions.

I won’t repeat the commands here as those instructions will be updated appropriately as packages, dependencies and Debian distributions change.

However I will mention I’m not a fan of piping raw URL output directly into sudo commands as some of the official instructions indicate. Anytime the instructions call for it I would suggest instead dumping the output to a file and sanity checking it before feeding it to a command with elevated privileges.

One last installation note is that depending on your environment and requirements you may want to give specific users the ability to control docker. As a result of completing the instructions indicated above a new group ‘docker’ was added, adding users to this group is as easy as:

usermod -aG docker yourUsername

Afterwhich you’ll need to re-login for the group changes to take effect (a simple ‘su – yourUsername’ on the command line you plan to run docker commands on suffices for the short-term)

Firing Up The Sandbox

Downloading and starting the PostgreSQL sandbox is as easy as:

docker run \
            --name pgsandbox1 \
            -e "POSTGRES_PASSWORD=password" \
            -d \
            postgres:10

This tells docker to:

  • Run a (ephemeral) container named ‘pgsandbox1’
  • In detached mode (-d)
  • Based  of the postgres:10 image tag
  • And sets the  PostgreSQL superuser (by default ‘postgres’)’s password to ‘password’ via a system environmental variable

Kicking Tires

First, we verify the container is running:

$ docker container list

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3b445b6a2dc0 postgres:10 "docker-entrypoint.s…" 3 minutes ago Up 3 minutes 5432/tcp pgsandbox1

Second, we exec psql inside the running docker:

$ docker exec \
          -it \
          pgsandbox1 \
          psql -U postgres

psql (10.3 (Debian 10.3-1.pgdg90+1))
Type "help" for help.

postgres=#

And as a final check we create a table, add a row to it and run a query:

postgres=# CREATE TABLE test (id SERIAL, name TEXT);
CREATE TABLE

postgres=# \d
             List of relations
 Schema |    Name     |   Type   |  Owner   
--------+-------------+----------+----------
 public | test        | table    | postgres
 public | test_id_seq | sequence | postgres
(2 rows)


postgres=# \d test
                            Table "public.test"
 Column |  Type   | Collation | Nullable |             Default              
--------+---------+-----------+----------+----------------------------------
 id     | integer |           | not null | nextval('test_id_seq'::regclass)
 name   | text    |           |          | 


postgres=# INSERT INTO test (name) VALUES ('Pat');
INSERT 0 1

postgres=# SELECT * FROM test;
 id | name 
----+------
 1 | Pat
(1 row)

Important reminder: The docker container running above is ephemeral; it does not retain or persist data between container restarts.

Final Notes

In the end of this we have a minimalist PostgreSQL instance ready for simple testing and examples. There is obviously a lot of room for enhancements and additional functionality. For those interested in modifications to the simple use-case above I would recommend reading through the base postgres docker image page which includes guidance on how best to do so.

Creating Self-Signed Certificates Using Keytool And OpenSSL

    Background

    Self-signed certificates are useful for testing as well as some limited real-world use cases. The instructions below cover commands necessary to create simple self-signed certificates using either keytool or openssl. keytool, being included with Java JDK installations, is more likely to be available on Windows if OpenJDK or Oracle JDK is installed, however Linux/Unix/OSX users will also typically have the more robust openssl program readily available to them.

    Note: To be clear, OpenSSL is also available for Windows however it is rarely found already installed on systems.

    Terminology

    Whenever cryptography is involved, you can expect to a deluge of terminology about complex concepts. Fully fleshing out these concepts is beyond the scope of this piece, but it would be beneficial to roughly summarize some of the terms as they apply to the operations used further down:

    • Keystore or Truststore – Container file designed to hold cryptographic materials such as keys, certificates, etc.
    • PFX – Effectively an alternate name for PKCS12 files. PFX is usually found within Microsoft contexts as PFX was initially a Microsoft format before it eventually became re-defined by RFC 7292 as PKCS12.
    • Key Pair – A public key and its corresponding private key together are commonly referred to as a “key pair”.
    • Private Key – The secret key of a key pair that should be availble only to the owner or owner programs/processes
    • Public Key – The public key of a key pair that is made available to one or more external parties to encrypt data that can (well…should) only be decrypted by someone with access to the corresponding secret key.
    • Certificate – A file format (typically X.509) that includes a key (commonly a public key) along with other relevant information such as expiration, issuer, etc.

    A Quick Note On Approach

    Several best practices are intentionally not followed in the commands below in order to emphasize the overall mechanics. Things like the keystore password being ‘password’, private key password not being set, and passwords included on the command line should not be followed outside an educational exercise context.

    Keytool Instructions

    As mentioned above, keytool is provided with Java JDK installations. It is capable of generating the necessary self-signed certificate materials however is otherwise fairly limited in its capabilities. For example as of the time of this writing there is no way to export a private key into its own dedicated file using keytool, only to another keystore container file. Another item to note is that prior to Java 9 the default keystore container format for keytool was JKS, which is a Java-specific format which limits its use by other software and tools. The examples below instruct keytool to use the more widely supported PKCS12 container format instead.

    Keytool primarily deals with keystores, so the approach followed below is to simultaneously generate a new keypair and store it in a new keystore, then afterwards export the public certificate to its own file.

    Create PKCS12 keystore container

    keytool \
        -genkeypair \
        -ext san=dns:localhost \
        -storetype PKCS12 \
        -keystore server.p12 \
        -storepass password \
        -alias myalias \
        -keyalg RSA \
        -keysize 4096 \
        -dname "C=,S=,L=,O=,OU=,CN=localhost"
    

    Export the certificate

    keytool \
      -exportcert \
      -rfc \
      -keystore server.p12 \
      -storepass password \
      -alias myalias \
      -file server.public.crt
    

    (Optional) Examine the keystore

    keytool \
      -list \
      -v \
      -storetype PKCS12 \
      -storepass password \
      -keystore server.p12
    

    OpenSSL Instructions

    OpenSSL approaches the creation and storage of cryptographic materials in a different sequence than keytool. In this case we will create the individual components (the private key and public cert) then package them up after into a PKCS12 file.

    Configure SubjectAltName

    If you create a certificate the way you see on most posts, you may get a warning from some clients similar to the following from python3:

    SubjectAltNameWarning: Certificate for localhost has no `subjectAltName`, falling back to check for a `commonName` for now. 
    This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
    

    As the warning indicates there is no subjectAltName field in the certificate. To do so requires either modifying the system openssl.cnf (or a copy of it) for OpenSSL pre-1.1.1 or using additional command line options if using OpenSSL 1.1.1 and onwards. OpenSSL 1.1.1 was just released Sept 11 2018 and hasn’t yet made its way into most distributions. The commands below will show the pre-1.1.1 method using a copy of openssl.cnf which should also continue to work when 1.1.1 becomes more widely available.

    Note: Your system’s openssl.cnf can be in several different places, common places to look:

    • /etc/ssl/openssl.cnf
    • /usr/lib/ssl/openssl.cnf
    • /usr/local/ssl/openssl.cnf
    • /usr/local/etc/openssl.cnf
    • /System/Library/OpenSSL/openssl.cnf (OSX)
    • /etc/pki/tls/openssl.cnf (RHEL)

    Prepare local openssl.cnf

    1. Make a local copy of the system openssl.cnf:
      cp /path/to/openssl.cnf openss.cnf.tmp
    2. Add subjectAltName entry in copy:
      echo -e "[SAN]\nsubjectAltName=DNS:localhost\n" >> openssl.cnf.tmp

    Create certificate and key

    openssl req \
           -newkey rsa:4096 \
           -sha256 \
           -x509 \
           -days 3650 \
           -nodes \
           -subj "/C=C8/ST=ST/L=L/O=O/OU=OU/CN=localhost/emailAddress=me@example.com" \
           -keyout server.private.key \
           -config openssl.cnf.tmp \
           -extensions SAN \
           -out server.public.crt
    

    If you get the following error:

    problems making Certificate Request
    139634576708864:error:0D07A098:asn1 encoding routines:ASN1_mbstring_ncopy:string too short:../crypto/asn1/a_mbstr.c:102:minsize=2
    

    This indicates that one of the fields in the -subj text is either empty or too short

    Assemble the PKCS12 container

    openssl pkcs12 \
      -export \
      -inkey server.private.key \
      -in server.public.crt \
      -passout pass:password \
      -out server.p12
    

    (Optional) Examine the keystore

    openssl pkcs12 \
      -nokeys \
      -info \
      -in server.p12 \
      -passin pass:password
    

    Testing with OpenSSL

    Whether you created your certificate and private key with keytool or openssl, you can use openssl to quickly test it.

    Note: If you created the keystore using keytool you’ll need to export the private key from the p12 container using openssl:

    openssl pkcs12 \
    -in server.p12 \
    -nocerts \
    -nodes \
    -passin pass:password | openssl rsa -outform PEM -out server.private.key

    Starting the test server

    The following commands will start the TLS/SSL server bound to TCP port 4430 and instruct it to respond to a simple HTTP GET requests.

    openssl s_server \
      -key server.private.key \
      -cert server.public.crt \
      -accept 4430 \
      -no_dhe \
      -www
    

    Testing client-side with web browser

    Browse to https://localhost:4430 in a browser (you may have to click past warnings). You should see info text provided by test server (ciphers supported, ciphers in common with client, etc)

    Testing client-side on the command line

    1. In a separate teminal window or tab, run: openssl s_client -connect localhost:4430
    2. Followed by: GET / HTTP 1.0
    3. Then the Enter key

    You should see the same HTML used to generate the page for the browser test.

Parsing Apache Logs Into A Database

Background

Parsing Apache HTTPD access logs into a database structure allows you to easily run queries and reports against it to better understand web traffic and detect issues. By using the pip installable apache_log_parser[1] package and the sqlite3 python module[2] that comes as part of the standard python library, we can quickly parse these access logs and insert entries into a sqlite3 database.

Approach Notes

Couple things to note on the approach taken in the code:

  • sqlite3 was selected for its native support and ease of use, but other DBMSes can be similarly used
  • An auto-incremented id column was not included in the schema as sqlite3 does not recommend such a column unless necessary[4]. You will likely want to add one for other database systems
  • apache_log_parser produces a dictionary of apache log entry parts. Given this, the available dictionary keys are dependent on the log pattern of the access.log which supplied to the make_parser() function
  • The columns in the code snippet are just a subset of the total keys/values returned by the parser. You can easily see all keys and values generated from a given line by calling pprint() on it
  • An additional ‘date’ key and value are added in the code (in a sqlite date function friendly way) to allow for date based queries and criteria

Code

#!/usr/bin/env python3

import sys
import sqlite3
import apache_log_parser

if len(sys.argv) != 2:
    print("Usage:", sys.argv[0], "/path/to/access.log")
    exit(1)

conn = sqlite3.connect('/tmp/logs.db')
cur = conn.cursor()

cur.execute("""
                CREATE TABLE IF NOT EXISTS logs (
                    status INTEGER,
                    request_method TEXT,
                    request_url TEXT,
                    date TEXT
                )
            """)

# Pattern below is from the LogFormat setting in apache2.conf/httpd.conf file
# You will likely need to change this value to the pattern your system uses
parser = apache_log_parser.make_parser(
                   "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""
         )

log_file = sys.argv[1]

with open(log_file) as f:

    for line in f:

        d = parser(line)

        # Line below adds minimalistic date stamp column 
        # in format that sqlite3 date functions can work with
        d['date'] = d['time_received_datetimeobj'].date().isoformat()

        cur.execute("""
                        INSERT INTO logs ( status,  request_method,  request_url,  date)
                                  VALUES (:status, :request_method, :request_url, :date)
                        """, d)

cur.close()

conn.commit();
conn.close();

Example Run

Sample access.log entries:

# cat /var/log/apache2/access.log
::1 - - [25/Nov/2018:20:13:28 -0500] "GET /index.html HTTP/1.1" 200 3380 "-" "User agent string"
::1 - - [25/Nov/2018:12:15:26 -0500] "GET /notthere HTTP/1.1" 404 498 "-" "User agent string"

Run the code above:

$ ./httpd_log_parser.py /var/log/apache2/access.log

Examine the results:

$ sqlite3 -header -column /tmp/logs.db 'SELECT * FROM logs'
status      request_method  request_url  date      
----------  --------------  -----------  ----------
200         GET             /index.html  2018-11-25
404         GET             /notthere    2018-11-25

Sample Queries

After loading log entries you can start running useful queries on them.

  • Show number of requests by response status:
      SELECT count(1) as nReqs,
             status
        FROM logs
    GROUP BY (status)
    ORDER BY nReqs DESC;
    
    nReqs       status    
    ----------  ----------
    4           200       
    2           404   
    
  • What are the most common 404’d URLs:
      SELECT count(1) as nReqs,
             request_url
        FROM logs
       WHERE status = 404
    GROUP BY (request_url)
    ORDER BY nReqs DESC;
    
    nReqs       request_url
    ----------  -----------
    2           /notthere 
    

References

  1. GitHub – rory/apache-log-parser – https://github.com/rory/apache-log-parser
  2. [Python] sqlite3 — DB-API 2.0 interface for SQLite databases – https://docs.python.org/3/library/sqlite3.html
  3. [SQLite] Date And Time Functions – https://www.sqlite.org/lang_datefunc.html
  4. SQLite Autoincrement – https://www.sqlite.org/autoinc.html

Photo Slideshows Using Raspberry Pi

Background

At a family event just shy of a year ago I wanted to set up a slideshow (two in fact) to run continuously. The event was outdoors and leaving two laptops exposed to the elements for an extended period of time was obviously not very appealing. Instead a couple of Raspberry Pis together with the feh image viewer proved to be the perfect fit.

System Info

  • Pi: Raspberry Pi 3
  • OS: Raspbian 9 (Stretch)
  • Desktop: LXDE

Feh Installation and Test

First install feh:

apt-get install feh

Then run the following command in the terminal:


feh \
    --recursive \
    --randomize \
    --fullscreen \
    --quiet \
    --hide-pointer \
    --slideshow-delay 6 \
    yourPicDir

The command line switches are for the most part self-explanatory. The idea is to give feh a directory containing pictures you’ve copied to the Pi and then have it cycle through them randomly, displaying each for six seconds. Other options are documented in the man pages if you’d like to customize the usage.

To exit the slideshow press the Esc key.

Disabling Screen Blanking

By default the screen will blank after a few minutes of mouse and keyboard inactivity. In order to disable the screen blanking we’ll need to modify the lightdm.conf file:

nano /etc/lightdm/lightdm.conf

and add the following line under the [Seat:*] section:

xserver-command=X -s 0 dpms

Auto-Start Slideshow On Boot

To have the slideshow automatically start on boot, first create a small shell script file to hold the feh command:

nano /home/pi/slideshow.sh

and paste the following:

#!/bin/bash


feh \
    --recursive \
    --randomize \
    --fullscreen \
    --quiet \
    --hide-pointer \
    --slideshow-delay 6 \
    yourPicDir

Be sure to make the new shell script executable:

chmod 755 /home/pi/slideshow.sh

And finally add it to the pi user’s LXDE autostart file:

echo "@/home/pi/slideshow.sh" >> /home/pi/.config/lxsession/LXDE-pi/autostart

When you reboot your Raspberry Pi it should now automatically start the slideshow.

To exit the slideshow, hit the Esc key and you will see the normal LXDE desktop.

Extracting RAW Photo Exif Data With Python

Background

I’ve always enjoyed taking pictures. Before smartphones put a camera in everyone’s pocket, I would use whatever (usually cheap) camera was handy. More recently I decided to get more into taking “real photographs” instead of just pictures. As anyone will tell you taking the dive into photography, there is much more complexity than you would normally expect, particularly when compared to point-and-shoot or smartphone cameras. As part of the journey, you will inevitably need to work with RAW image files. While RAW files provide much more information to work with, you may find support for different manufacturers RAW format lacking in tools and libraries that you may have used in the past. This post briefly covers extracting Exif metadata from most RAW file formats with python.

bug photo

pyexiv2

We will be using pyexiv2/py3exiv2 for this example, which are python wrappers around the exiv2 C++ library that does the actual image file processing.

  • pyexiv2 is the python 2.x version of the library
  • py3exiv2 is the python 3.x version

The code snippet below will work with with either pyexiv2 or py3exiv2.

Setup

pyexiv2

For Python 2.x, the current Debian stable as of this writing (Debian 9 Stretch) already includes pyexiv2 in its package repository, you only need to run:

apt-get install python-pyexiv2

py3exiv2

For Python 3.x, since the py3exiv2 isn’t available from apt, you’ll need to build it locally. To do this you’ll need to install some prerequisites:

apt-get install \
        build-essential \
        python-all-dev \
        libboost-python-dev \
        libexiv2-dev

Then use pip3 to download, build and install py3exiv2:

pip3 install py3exiv2

exiv2_demo.py

The following code snippet will:

  • Read a file given on the command line
  • Print all Exif tags found in the file
  • Print a specific tag value

Code

#!/usr/bin/env python3

# usage: exiv2_demo.py <file>

import sys
import pyexiv2

file = sys.argv[1]

md = pyexiv2.ImageMetadata(file)
md.read()

# print all exif tags in file
for m in md.exif_keys:
    print(m + "=" + str(md[m]))

# print specific tag 
aperture = float(md['Exif.Photo.FNumber'].value)
print("Aperture: F{}".format(aperture))

Note 1: One thing to keep in mind is that different Exif keys will return different value types. For example Exif.Photo.FNumber returns a Rational type which was converted above to float to print how aperture is commonly expressed. The ‘Exif metadata reference tables’ link in the References section has a list of tags and their respective types

Note 2: Different RAW file formats (.ARW, .CR2, .NEF, etc) will often have a different set of Exif tags, particularly for lesser used/uncommon ones (for example Exif.Photo.LensModel).

Example Run

$ ./exiv2_demo.py test.sony.ARW

Exif.Photo.ExposureTime=<Exif.Photo.ExposureTime [Rational] = 1/125>
Exif.Photo.FNumber=<Exif.Photo.FNumber [Rational] = 40/10>
...
Exif.Photo.ISOSpeedRatings=<Exif.Photo.ISOSpeedRatings [Short] = 1600>
Aperture: F4.0

Dockerfile

If you’d like to experiment in a Docker sandbox, the following Dockerfile will spin up a docker container with py3exiv2 and run the script:

FROM python:3.6-stretch

RUN apt-get update && \
    apt-get install -y \
        build-essential \
        python-all-dev \
        libboost-python-dev \
        libexiv2-dev 

COPY \
    exiv2_demo.py \
    test.sony.ARW \
    ./

RUN pip3 install py3exiv2

CMD [ "python3", "./exiv2_demo.py", "test.sony.ARW" ]

References

  1. Exiv2 project: http://www.exiv2.org/
  2. py3exiv2 project: https://launchpad.net/py3exiv2
  3. EXIF metadata reference tables: http://www.exiv2.org/tags.html
  4. EXIF tag value chart: https://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/EXIF.html