17

What is Bitgatt and why do we need it?

 3 years ago
source link: https://eng.fitbit.com/what-is-bitgatt-and-why-do-we-need-it/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
What is Bitgatt and why do we need it?

At Fitbit we utilize Bluetooth Low Energy, commonly known as BLE, for particularly heavy data transfers which are more complex than the common handful of use-cases for this extremely efficient radio technology.  One of the perks of the job, in my opinion, is to try to deliver amazing experiences between wearables and cloud-connected mobile devices through the heavy use of the various BLE technologies and specifications. Unfortunately one of the side-effects of this is that we find ourselves living in edge cases that Android OEMs, and sometimes even Google, didn’t anticipate when designing their APIs and implementing their bluetooth drivers.  We presently make limited or no use of standard bluetooth GATT profiles for low energy because there aren’t any that are sufficient for the variety of functionality we desire from our wearables. Because of this we have designed our own protocols to perform the operations that are required for our devices to help you reach your health and fitness goals and to provide a robust app ecosystem.

The Android BLE API, at present, roughly mirrors the Broadcom developer API used for their chipsets abstracted by means of an open source Linux library called “bluedroid” with significant modifications, all contributed back by Google naturally, then bridged using JNI for Java.  There are a number of adjustments to ensure some level of security and isolation between applications, but aside from that it represents a fairly raw API giving developers an amazing amount of power, but at the same time limiting access in ways that are often frustrating. In some ways this is the worst of both worlds, you have the power to shoot yourself in the foot constantly, but not enough power to actually resolve the issues that your use of the API as documented will create. Bluetooth as you can see below is an incredibly complex layer-cake of protocols and requirements which are very easy to miss, skip, or hack for OEMs who are under time pressure to deliver these complex radio devices.

hSZdeZzEMrFHWi8SB8Szr4H3C9gF4NkjC6LfPdUQwJP1c_KomoICz6ArAWkHmqKEssRyd_KJqHbCerId1EBANrpB2IydJU67Nwf409US09Iftw2Z1dRj9MsKd8_DLolXiLhEsKs8
So you can see why this is hard…

We have refactored our Bluetooth implementations several times in the past at different levels over the years as we gained more knowledge of what was actually transpiring down in the Android OS stack and even below in the OEM’s BSP stack.  What we found was that there is quite a wide variance of behaviors between Android OS versions, phone models, Bluetooth chipset vendors, and even cellular carriers. For example, on some models of Android handset the antenna design is such that when Wi-Fi and Bluetooth are both active there is a non-trivial bit of compromise on the performance of Bluetooth as regards throughput.  This creates issues for us as we transfer megabytes of data over this channel at times. The user may be using many different BLE and Bluetooth devices using any one of the available protocols at any time. The Android API does not give feedback to the developer regarding the users’ use of Bluetooth on the phone, so it is up to their code to infer what is happening in the environment and react.  This is just one of the complexities of being a mobile Bluetooth developer.

hblNP3O_4gOmM2-yn5d6NHajcH9dkmDp-wEWwY6WBc8X4RcupPa2zW3ZTfUEbngJZvWkM0J2ApXQwz_Rqgl4esBj5OZ4s-mo3x4dmUj7JPWSjFGkN0H3uUvK7p-Cjl5xU82zXeGv
Bluetooth all the things! Image by 200 Degrees from Pixabay

Several years ago, when we didn’t fully understand how these various inconsistencies could conspire to create odd bugs, it would take us months sometimes to troubleshoot bugs after reading thousands of lines of logs and doing that across dozens of handsets.  We would try to reproduce a bug encountered on a lower-end device on our developer spec flagship Android handset and fail to reproduce it. These issues led us to simplify our code, reducing threading, caching and other complexities in an effort to make troubleshooting easier.  This helped, but we still had odd issues that we couldn’t explain reviewing our code alone. At this point, as the mobile team, we had enough knowledge of BLE as a spec, our tracker firmware, and the various chipsets’ drivers to start to triangulate the issue. More often than not, it would result in a complex interaction bug where the chipset driver implementation either fails to comply with the BLE specification, or creates an adverse interaction between the particular version of Android and that chipset implementation, leading to paralysis of the BLE API. When this occurs it is resolvable only by restarting bluetooth (it is still amazing that you can do this without user interaction) or by asking the user to clear their bluetooth share (dropping the GATT databases).

Some potential solutions we considered were more aggressive, including turning bluetooth on and off automatically, heavily using private methods, etc … We understand however, that for a user who is utilizing Bluetooth for multiple operations on their device this would create a horrible experience for that user, and though it would be difficult for them to know that it was our application causing problems with their headphones, etc … we as Android users would not want an application behaving in this way on our own phones, so we would not do this to our users.  

We determined that the only way forward was to establish a deep understanding of how things were working such that we could work around these problems in the most expedient way.  Then we could quickly triage and report other issues to Google, leaving it for them to manage as stewards of the platform. To that end we needed to have a robust understanding of the state of our interaction with the GATT, prioritizing stability over speed.

In order to quickly find issues we have split our implementation into three tiers.  The lowest level was to be around our interactions with the GATT, the middle level to be our protocol implementation, and at the highest level, our business logic.  The library that we are now open sourcing is the one around the lowest level, our GATT interaction. We do this in the hopes that we can help other developers avoid some of the pitfalls that we have run into with Android BLE, and to get help adapting to the issues that we can not avoid.  This level of encapsulation and separation of concerns allows us to have robust test harnesses even at runtime to verify that as far as that tier of logic is concerned, we are certain that the behavior is logical. This enables us to identify problems with Android or the OEM’s Bluetooth implementation quickly and thus avoid spending precious time trying to fix the unfixable while filing issues against the correct party.

DefCYSDndX0Yg8o-cqm0H3KmX3yytpauThrIKAzyA6H1gi6_17BZTtrvDxKoGy_uXhI6QJ2Jbn4UYbBqpWMOKiTSzh7Kpi8YyfdCoxVVgSBdlO4Fo2oqErbjuUCgtXFtfyHA0ZEn
Bitgatt system diagram

What Bitgatt, or Fitbit Gatt, attempts to do is to operate transactions against the GATT database, this database being the repository on both sides of the connection that is synchronized on a schedule and is at the core of the concept of BLE.  Bitgatt will provide a single thread per connected peripheral that operates on its own queue to prevent the Android OS’ queue from filling. On several versions of Android if the GATT queue is full when the device disconnects, you will receive, upon connection, the already full GATT queue once again represented by a “client_if” or client interface.  When you attempt GATT operations they will go onto the end of this queue and not be able to be sent, because this queue is closed. Managing our own queue is an example of one of the many design choices we made when building Bitgatt that prevent an Android developer from creating their own nightmares when heavily using BLE.

Another significant issue that Bitgatt seeks to help developers to avoid, is that when writing and notifying quickly on some Android versions, the value provided in the characteristic is a pointer to the C structure holding the data.  This means that upon the next write, the value that the Android Java object points to is different than the value that you received initially (not to mention that the GATT callbacks are being delivered on the JNI binder thread which creates its own issues if you have long running operations triggered in the callback).  To prevent this Bitgatt does deep copies of the characteristics, services, and descriptors returned. In addition to this, constructing a characteristic or descriptor, writing a value, and then attempting to update or notify this value on a remote device will fail because there is an instance id variable contained in the object provided from the BluetoothGatt object itself.  The BluetoothGatt object will actually take one of these constructed data objects and then become stuck, instead of rejecting the instance. Bitgatt uses its own object types for characteristics, descriptors, and services to prevent this possibility. There are dozens of other issues that Bitgatt will protect potential implementers from, however those are a couple of the most insidious.

sNoXuPUtUQbitzuCOHqm1jF0THidDnANy55zIhUIwn0Lp0K59WfWunpeu0ZpD2cSmO439rtnaDDR_cPMWUaXXcK5ZMYThC96ji13QZmH7hVkcg-4tEzva-jL_4QNWI6X_oxkMWnG
Strategy Pattern

For issues that arise that are specific to phone models, we have employed the strategy pattern.  This is to prevent compromising the core of the implementation with model / OS / Carrier specific hacks that will render it unmanageable in the future.  Removing these strategies is fairly straightforward as there are clear hooks where we employ the strategy. Once the device that has the specific issue is no longer a concern, the code can be deprecated and removed without a full set of regression tests.  

Finally, we have built a lab environment in which we can test the GATT implementation of the phone independently of any Fitbit business logic (outside of Bitgatt).  We can simulate any tracker’s behavior and determine whether the mobile OS or our firmware behavior is at fault. We run automated regressions against Bitgatt, but we also try to test when mobile OS updates roll out to help us detect issues hopefully before our users.  There is probably another entire blog post on our testing around GATT, but I’ll leave that to another developer :-).

ahXICMetsCMOI2pB1cO7Kg_bP58L51dLzniUsMCcoqMY462D3_CFfG2miq9R95BSg9Eg1HlG4FLb3DJHR-dTviH83uvJ8OTfm96zndoNoeOx4tEwJVITR3n00WOKA4Ol46kzD4wA
It all started here – BT4.0 … Android phones that use this are still out there …© Raimond Spekking / CC BY-SA 4.0 (via Wikimedia Commons)

Ideally, and perhaps idealistically, as Android BLE has been something of a tragedy of the commons with every application having access to every other application’s peripherals, being able to interact with the radio in unfettered ways, and consuming BLE bandwidth in any way they see fit, working together to utilize one or more quality libraries can help to reduce the bandwidth wasted through un-necessary scanning, applications fighting with each other and keeping the GATT queues healthy.  We hope that you find Bitgatt useful, even just for inspiration, and we look forward to collaborating with the Android community on BLE. Even if you are just looking to do something simple with BLE, it would be best to start using one of the existing BLE frameworks, as you will be able to spend more of your time on your own functionality and less troubleshooting someone else’s. Inevitably whatever your use of BLE you will ultimately expand your feature set and eventually find yourself where we were, understanding that this technology is deeply powerful, but also wildly complex.  We are still learning, and we hope that you will learn with us.

https://github.com/fitbit/bitgatt

About the Author

Irvin Owens Jr – Principal Software Engineer.

_Pt7fAJN9hjdD7S0ls6zCo7tXCq2OIpSybQcLG28nrr3nLjeIQSHmB3Ov9iZKXT2K5AqFzS3QZn3scHVwkU3_3b1c_fs6gVprpu43nv1kk3Tm0EBZyqxoV_hwqNKjJ0OeOfF_2Qb

Irvin has been working on Android at Fitbit for 4 years and has been focused on Bluetooth Low Energy for the past 3 years.  He has worked on everything from backend to full-stack, down to mobile, and embedded. He enjoys cycling around the Bay Area and playing basketball with his family. He is presently training (quaking silently) for an impending ride up Mt. Baldy, but he still may chicken out and settle for a metric century around the East Bay :-).


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK