Bits, Nibbles, Bytes, and the Impossible CAN Bus

In The Blacklist season 1, episode 17 “Ivan” - a super-hacker remotely deploys the airbag in a car and causes it to crash. He steals a Plot Device from the car which will make him an even super-duperer-hacker and runs off with it.

So far, so good. The Blacklist season 1 is excellent, and James Spader crushes it in every scene. I'll suspend my disbelief all day... Until about 7 minutes into the show, anyways...

Checking the OBD for hacks

If I could take a look at that car's ECU, I might be able to confirm whether it was electronically manipulated.

A car’s ECU is an “electronic control unit” and it’s responsible for controlling a subsystem in the vehicle. In the olden days, there might be one ECU controlling the engine. In the 2000’s, there may have been closer to a dozen or so ECUs. Nowadays, there are at least a few dozen, climbing towards 100-200 or so. You'll have an ECU for controlling engine timing and performance (obviously), but you might also have them for cruise control, battery management, climate control, parking assist, and yes, airbag deployment.

The resident tech geek (Aram) pulls out his laptop and plugs in a USB cable to something that seems to be going through the driver-side window. My guess would be that he’s plugged into the car’s OBD (on-board diagnostics) port. The OBD-II specification has been around since the mid-90’s, so it's unsurprising that any given car in the show would have one.

The OBD-II specification dictates a minimum subset of information through standardized parameter IDs that each vehicle must provide, but each vendor can (and usually does) provide additional information on top. These can be additional diagnostics, or even access to other onboard systems. Often these non-standard parameter IDs are kept proprietary, and you need to pay-to-play.

This thing was definitely tampered with. The airbag deployed before the crash, not after.

As I didn't pay ludicrous sums of money to these car manufacturers, I can't say for certain, but it wouldn't surprise me if a car maintains a log of airbag deployment status and deployment time. Ditto for determining a "crash" status using onboard accelerometers, or other sensors/failure modes. Similar to an airplane's black box, personal and commercial vehicles often have event data recorders which record the last few seconds to minutes before a crash. On some vehicles, this black box information is stored with the airbag or restraints control modules.

However, access to the full forensic telemetry from the black box... I'm not sure if that would just be available over the OBD port, necessarily.

Maybe it wasn't the OBD at all

On second thought, I might have jumped the gun saying he was ever using the OBD port.

Aram might have had one of these expensive Bosch Crash Data Retrieval kits already plugged into the black box inside the car and was always reading full telemetry from the event data recorder.

My bad Aram, I shouldn't have doubted you.

CAN Bus? More like... CAN'T Bus

There's a message in the CAN Bus deliberately left here for someone to find.

... Dammit Aram... Things are starting to fall off the rails...

The CAN (controller area network) Bus is standard that allows various ECUs to communicate with one another, indirectly. Like any event bus system, it allows de-coupled communication - where one component can put a message up on the bus, and whomever is interested in listening can consume the message and action it. Everyone else will ignore it.

There are a whole slew of ISO specifications detailing the electrical signaling, throughput, timing, bus arbitration, priorities, and a hundred other implementation details.

What's important to know is that CAN communication is a protocol comprised of fundamentally transient messages, not persistently stored ones. If you didn't catch it when it was sent, it's gone. You can’t "leave" a message in the CAN bus. That would be like saying "someone left a message in the Wifi for us to find".

Being the charitable man that I am, I’ll assume that tech bro intended to say "a message sent over the CAN Bus was logged/put into a persistent queue by one of the ECUs (or the black box), for someone to find" ... Somehow...

I think I’ll be less charitable in just a second, though:

32 bytes

011010--

It's binary. It spells "Ivan." His digital signature.

I gave the previous oddball a sentence a pass, but this one had me rewinding to make sure I heard correctly. He found a 32 byte CAN message?

Three-fingers meme

Vehicular digression

I’m not now, nor have I ever really been, a car guy. I have friends who can My Cousin Vinny a car from just the tire tracks, but that’s not me. So, I’m only a little bit embarrassed at how long it took for me to identify the car that gets hacked and crashes. As it was going to crash, they did the standard TV thing of removing identifying logos from the car.

Back of the Ford Taurus

Ford Taurus flipping

Front of the Ford Taurus

This car looked familiar though, I’m quite certain I’d even been in one when I was younger. I figured it was either a Ford or a Mercury (kinda the same thing at some point) and it had the late 90’s, early 00’s aesthetic.

After some internet sleuthing, I narrowed it down to a Ford Taurus, or a Mercury Sable. And after some more investigative work, I landed on it probably being a 2004 Ford Taurus.

Now, had I just continued watching the show for a few more minutes, I would have seen the newspaper clipping where they refer to it as a Ford Taurus and saved myself a bunch of time.

So, a 2004 Ford Taurus - and this episode of the Blacklist came out in early 2014, so was likely filmed in 2013.

Back on course

32 bytes

They found a 32 byte CAN message in a 2004 Ford Taurus. No, no they didn't.

The CAN 2.0 specification released in 1991 and featured a maximum message size of 8 bytes. Even if we use a whole CAN frame by breaking the protocol entirely (thus would be viewed as an error and rejected) we're just under 14 bytes for a CAN 2.0A frame and about 16 bytes for a CAN 2.0B frame - still not 32.

CAN Bus Frame, By Ken Tindell, Canis Automotive Labs Ltd, CC BY-SA 4.0

It wasn’t until the CAN FD (flexible data-rate) specification came out in 2012 that a (backwards compatible) protocol for handling up to 64-bytes was possible. The major IC vendors trickled CAN FD support out during 2014/2015 and given that we’re talking about car refresh cycles - the first cars fully using CAN FD likely didn’t roll out until 2017/2018.

They didn’t need 32 bytes in the first place

011010-- It's binary.

Fun fact, all data housed and transmitted to/from computers is binary.

CPUs are billions of transistors flipping between two distinct voltages - represented as 1's and 0's. The voltages used by various parts of a computer are different though: a CPU core might switch between 0V to 0.8V (or 0V to 1.2V or even 0V to 1.5V), Your RAM might operate closer to 0V to 1.5V, other components might go up to 3.3V, your USB operates at ~5V, and some signaling protocols might go from -12V to +12V - but all digital signals are trying to go between two voltages to represent binary values.

Computer folks are weird

Conversationally, each 0 and 1 is a bit, and we group 8 of these to make a byte. These are, incorrectly, used interchangeably and can end up leading to confusion when talking about internet speeds from internet service providers. Classic example: I want to download a 150 megabyte file, and I have a 150 meg-per-second connection, that should take 1 second, right?

Bell's download speeds in megabits-per-second

Wrong! ISPs typically specify their service in bits-per-second, not bytes-per-second. So, your connection is actually 8x slower than you probably think. Typically, uppercase 'B' is for bytes (100MB file) and lowercase 'b' is for bits (100Mb per second).

The 'M' and 'G' are just standard metric prefixes, right? WRONG! Maybe. Sometimes...

Depending on who you talk to, a megabyte (MB) is either 1,000,000 bytes, or, 1,048,576 bytes. The first is the correct metric interpretation, while the second is the interpretation of most computer people. Being binary, computers operate in base-two numeral system, while humans tend to stick to the base-ten decimal system. So, the computer people interpretation of a megabyte is 1024 * 1024 bytes (where 1024 is 2^10). This terminology has since been standardized as a mebibyte (a term I've never once used).

If you don't have any reference for file sizes, this chart is super helpful for context.

Amusing side note: 4 bits is called a "nibble", which was also the name of the very first snake game I played, and it was written in QBasic. That was also the first time I tried programming, as we would modify the source code to change the snake colours and base speeds.

Back to words

So, we've got our bits, nibbles, and bytes. How does that turn into text?

Well, that's where character encoding comes into play. Just explaining the various character encodings, how they came about, how they're used, and their nuances could be a PHD thesis all on it's own.

I'll heavily summarize by saying that in English, almost all of what you see on the internet is encoded as UTF-8 (Unicode Transmission Format) and other than emojis, you could likely represent most of your reading as 7-bit ASCII (American Standard Code for Information Interchange) because UTF-8 was designed to be backwards compatible with ASCII.

ASCII

Originally standardized in the 1960's and updated over the next few decades, ASCII reserves 33 code points (aka characters, aka bytes in ASCII-land) as control characters while the rest of the available 7-bits are used for printable characters. A control character is a piece of information that's used to provide metadata or perform an action on a data stream. As decades ago these control codes were used with printers or other interaction with the physical world, the remnants of that usage persists in the control character naming. Hence typewriter-like words such as "carriage return" and "line feed".

The simplest example of a control character is delete/backspace. It's not a visible character on screen, but rather, when you press backspace, you're sending a byte of information to your computer to tell it to remove the previously visible character.

So, with 7-bits available (2^7 = 128 characters), 33 are used for control codes, which leaves 95 visible/printable characters. There is a lowercase alphabet, uppercase alphabet, the digits 0 through 9, and then 33 characters for punctuations and symbols. Interestingly, "space" is also considered a printable symbol, even though it's just a gap between other printable characters.

All the 7-bit ASCII values are shown in this table, with their representative binary/decimal/hexadecimal representations:

Table of 7-bit ASCII characters and numerical encodings

Bytes are 8-bits though, right? So, we've got a free bit just waiting to be used. That's 128 free characters we can encode, right?

Yeah, and as that wasn't standardized, you end up in this weird scenario of different corporations or different locales using the top 128 characters for a hodgepodge of other meanings.

UTF-8

Unicode is a standard intended to capture all textual writing in the world, regardless of language (emojis are also part of Unicode - and are thus just Unicode characters). UTF-8 is the variable-length encoding of Unicode characters (also known as code points). The intention is that the more common the character, the fewer bytes it should consume. It is also backwards compatible with ASCII.

So the ASCII character set (the first 128 code points) still only require 1 byte in UTF-8. The next ~2000 code points require 2-bytes and cover most Latin, Cyrillic, Greek, Hebrew, and other alphabets. 3-byte code points cover most of Chinese, Japanese, and Korean (CJK), leaving 4-byte code points for the remainder of CJK, emojis, and various other symbols.

For example, the letter A is assigned the code point U+0041 and is encoded in UTF-8 as the 1-byte 0x41.

The 💩 emoji is code point U+1F4A9, which is encoded as the 4-byte 0xF09F92A9.

Who cares?

Well, I do. Because, if I was still on the charity train (which I disembarked for a short stroll down skeptic’s lane), we could claim that there were actually four 8-byte messages "left in" the CAN Bus, representing the signature of the hacker.

32 bytes

011010--

It's binary. It spells "Ivan." His digital signature.

However, I've just showed that, regardless of which encoding we use (ASCII or UTF-8), the name "Ivan" only requires 4 bytes. So, the hacker's name could have always fit within the size of a classic CAN message, and this whole 32-byte debacle need never have happened.

Though, small shout out for the detail of not just saying random 1's and 0's, but actually starting to say the binary representation of "i":

"i" -> 0x69 -> 01101001

Final Thoughts

This trip down CAN Bus lane reminded me of the work I did some years ago with Redundant Ethernet, specifically High-availability Seamless Redundancy (HSR). At a conference in the UK in 2018, I spoke briefly with the person involved with in-car ethernet at BMW. They pointed out that the technologies we were discussing wouldn't make it to cars (or maybe even the design phase of cars) until 2025 (this year).

At the time I thought that was an absolutely ludicrous timeline, but then, thinking about manufacturers wanting to stick to what works, what's reliable, proven, and cheap - I could see why that was the case. It's more that they're dragged kicking and screaming into newer technologies as older ones can't support the bandwidth and throughput of autonomous car tech.

It's also worth pointing out that in this whole article, I didn't even touch on the fact that the car was remotely hacked causing it to crash. Well, that's not far fetched. Quite the opposite, this scene was prescient.

CAN Bus is a default insecure protocol. The "security" is entirely based on access to the bus. Once you can access it, you can spam the bus causing a denial-of-service attack, listen to anything being transmitted, or, in this case, push inauthentic messages on to the bus and get some level of control over the vehicle.

In fact, a year after this episode aired, Jeep had 1.4 million cars recalled due to the infamous 2015 Jeep Cherokee hack. The craziest part was that the car was remotely hacked via the infotainment system - which is supposed to be air gapped from the CAN Bus. However, there was a subsystem that had read-only access (hypothetically) to the CAN Bus. The researchers performed an unauthenticated firmware update to that subsystem, and bam, read-write access to the CAN Bus.

Now, what's even crazier than this, is that I'd bet that cars are MORE hackable today than they were 10 years ago.