Splunk - AES-GCM Decryption failed!

How I tracked an obscure error through a black box and back by leveraging Ghidra

Feb. 14, 2020

The Solution

Let's start things off right. For those of you who've ended up on this page searching for a solution to this issue, I won't make you read through this entire post to find it.

To resolve this issue, simply delete your passwords.conf files from the local folder within the affected apps and reconfigure those apps.

If you want to understand what makes this issue occur, and why deleting the passwords.conf file resolves the issue, please keep reading for the full breakdown.

The Cause

I initially encountered this error following a wildly unsuccessful upgrade of one of my company's on-prem Splunk heavy forwarders from Splunk 7.2.5.1 to Splunk 8.0.1. I say wildly unsuccessful because I encountered not one, but two completely undocumented issues while following the Splunk upgrade documentation. The documentation I followed can be found here and here. The whole upgrade process took over a week from start to finish, but I'm happy to say things are functional now. I did reach out to Splunk support multiple times for assistance, but (as of the time of this writing) they have yet to get back to me with a solution.

Let's start at the beginning . . .

The first issue I encountered undoubtedly played into the development of the second issue, so I'll touch on it briefly. Our heavy forwarders run on CentOS 7, so I performed the upgrade using the Splunk 8.0.1 RPM file available on Splunk's download page. Everything worked great - until I got to step 8 in their upgrade instructions. After hitting Step 8, I was prompted for root account credentials. After providing them, I received the notification that "Job for Splunkd.service failed because the control process exited with error code." You can see this in the screenshot below.

splunk-service-failed-to-start.png

This one took me a few hours to figure out, but ended up being caused when the Splunk installer removed and recreated the Splunk service definition file (/etc/init.d/splunk) under "root" user context instead of creating it under the context of the user "splunk," which is the context under which Splunk was installed.

The series of events, as best I can tell, is as follows:

  • The Splunk RPM replaced the Splunk service definition file at /etc/init.d/splunk.
  • The upgrade process (correctly) installed Splunk under the user “splunk,” but created the service definition file under “root” user context.
  • When attempting to start the Splunk service, systemd launched the Splunk executable under “root” user context, as directed by the service definition file.
  • The Splunk binary checked the user context it was attempting to run under. Upon determining it was being started under “root” user context instead of “splunk” user context, the application terminated without logging an error message.

To resolve this issue, it was simply a matter of recreating the service definition file under the appropriate user context. You can see this process in the following screenshot.

splunk-service-definition-fixed.png

After taking the preceding screenshot, the Splunk service was running again! I wrote up a report, sent it to my team and to Splunk, and counted it as a victory. Everything was up and running normally again, life was good. Or so I thought . . .

The real problem rears its ugly head

Later that day, I updated the TAs to their latest version to ensure that they were able to run on Python 3. I deployed a new TA I'd written and was surprised to find it was non-functional. I thought to myself, What the hell?! It worked in dev! Thus began 3 entire days of troubleshooting.

The logs for my new TA were deeply unhelpful. It was simply a stack trace showing a 500 error when attempting to leverage Splunk's client library to access Splunk's internal password store. You can see the stack trace below.

2020-02-09 06:28:52,158 ERROR pid=23549 tid=MainThread file=base_modinput.py:log_error:309 | Traceback (most recent call last):
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/modinput_wrapper/base_modinput.py", line 113, in stream_events
    self.parse_input_args(input_definition)
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/modinput_wrapper/base_modinput.py", line 154, in parse_input_args
    self._parse_input_args_from_global_config(inputs)
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/modinput_wrapper/base_modinput.py", line 173, in _parse_input_args_from_global_config
    ucc_inputs = global_config.inputs.load(input_type=self.input_type)
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/splunktaucclib/global_config/configuration.py", line 272, in load
    input_item['entity']
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/splunktaucclib/global_config/configuration.py", line 177, in _load_endpoint
    **query
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/solnlib/packages/splunklib/binding.py", line 289, in wrapper
    return request_fun(self, *args, **kwargs)
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/solnlib/packages/splunklib/binding.py", line 71, in new_f
    val = f(*args, **kwargs)
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/solnlib/packages/splunklib/binding.py", line 679, in get
    response = self.http.get(path, all_headers, **query)
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/solnlib/packages/splunklib/binding.py", line 1183, in get
    return self.request(url, { 'method': "GET", 'headers': headers })
  File "/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py3/solnlib/packages/splunklib/binding.py", line 1244, in request
    raise HTTPError(response)
solnlib.packages.splunklib.binding.HTTPError: HTTP 500 Internal Server Error -- {"messages":[{"type":"ERROR","text":"Unexpected error \"\" from python handler: \"REST Error [500]: Internal Server Error -- Traceback (most recent call last):\n  File \"/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py2/splunktaucclib/rest_handler/handler.py\", line 117, in wrapper\n    for name, data, acl in meth(self, *args, **kwargs):\n  File \"/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py2/splunktaucclib/rest_handler/handler.py\", line 352, in _format_all_response\n    self._encrypt_raw_credentials(cont['entry'])\n  File \"/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py2/splunktaucclib/rest_handler/handler.py\", line 386, in _encrypt_raw_credentials\n    change_list = rest_credentials.decrypt_all(data)\n  File \"/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py2/splunktaucclib/rest_handler/credentials.py\", line 290, in decrypt_all\n    all_passwords = credential_manager._get_all_passwords()\n  File \"/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py2/solnlib/utils.py\", line 159, in wrapper\n    return func(*args, **kwargs)\n  File \"/opt/splunk/etc/apps/TA-1password/bin/ta_1password/aob_py2/solnlib/credentials.py\", line 272, in _get_all_passwords\n    clear_password += field_clear[index]\nTypeError: cannot concatenate 'str' and 'NoneType' objects\n\".  See splunkd.log for more details."}]}

The logs here showed me that the Splunk client library is receiving a 500 error from the Splunk "storage/passwords" REST endpoint. The 500 error occurred when a call to something returned a NoneType object instead of the string that the handler was expecting. This bit of information, while useful, is not particularly helpful in resolving the issue. Or even finding the cause. I needed more information, so I took that log entry's advice, and I checked splunkd.log

The splunkd.log file turned out to be an absolute mess of cryptic error messages. Here's a snippet for you.

02-11-2020 12:03:21.114 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.114 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.114 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.114 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.114 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.114 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.114 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.114 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.122 +0000 ERROR AdminManagerExternal - Stack trace from python handler:\nTraceback (most recent call last):\n  File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 111, in init_persistent\n    hand.execute(info)\n  File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 634, in execute\n    if self.requestedAction == ACTION_LIST:     self.handleList(confInfo)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunk_ta_mscs_rh_azureaccount.py", line 132, in handleList\n    AdminExternalHandler.handleList(self, confInfo)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/admin_external.py", line 51, in wrapper\n    for entity in result:\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 123, in wrapper\n    raise RestError(500, traceback.format_exc())\nRestError: REST Error [500]: Internal Server Error -- Traceback (most recent call last):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 116, in wrapper\n    for name, data, acl in meth(self, *args, **kwargs):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 353, in _format_all_response\n    self._encrypt_raw_credentials(cont['entry'])\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 387, in _encrypt_raw_credentials\n    change_list = rest_credentials.decrypt_all(data)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/credentials.py", line 289, in decrypt_all\n    all_passwords = credential_manager._get_all_passwords()\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/utils.py", line 159, in wrapper\n    return func(*args, **kwargs)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/credentials.py", line 272, in _get_all_passwords\n    clear_password += field_clear[index]\nTypeError: cannot concatenate 'str' and 'NoneType' objects\n\n
02-11-2020 12:03:21.122 +0000 ERROR AdminManagerExternal - Unexpected error "" from python handler: "REST Error [500]: Internal Server Error -- Traceback (most recent call last):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 116, in wrapper\n    for name, data, acl in meth(self, *args, **kwargs):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 353, in _format_all_response\n    self._encrypt_raw_credentials(cont['entry'])\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 387, in _encrypt_raw_credentials\n    change_list = rest_credentials.decrypt_all(data)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/credentials.py", line 289, in decrypt_all\n    all_passwords = credential_manager._get_all_passwords()\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/utils.py", line 159, in wrapper\n    return func(*args, **kwargs)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/credentials.py", line 272, in _get_all_passwords\n    clear_password += field_clear[index]\nTypeError: cannot concatenate 'str' and 'NoneType' objects\n".  See splunkd.log for more details.
02-11-2020 12:03:21.143 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.143 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.143 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.143 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.143 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.143 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.143 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.143 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.152 +0000 ERROR AdminManagerExternal - Stack trace from python handler:\nTraceback (most recent call last):\n  File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 111, in init_persistent\n    hand.execute(info)\n  File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 634, in execute\n    if self.requestedAction == ACTION_LIST:     self.handleList(confInfo)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunk_ta_mscs_rh_mscs_azure_resource.py", line 90, in handleList\n    mscs_util.check_account_isvalid(confInfo, self.getSessionKey(), account_type="azure")\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/mscs_util.py", line 104, in check_account_isvalid\n    account_configs = conf.get_all()\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/utils.py", line 159, in wrapper\n    return func(*args, **kwargs)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/conf_manager.py", line 241, in get_all\n    key_values = self._decrypt_stanza(name, stanza_mgr.content)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/conf_manager.py", line 126, in _decrypt_stanza\n    self._cred_mgr.get_password(stanza_name))\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/utils.py", line 159, in wrapper\n    return func(*args, **kwargs)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/credentials.py", line 118, in get_password\n    all_passwords = self._get_all_passwords()\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/utils.py", line 159, in wrapper\n    return func(*args, **kwargs)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/credentials.py", line 272, in _get_all_passwords\n    clear_password += field_clear[index]\nTypeError: cannot concatenate 'str' and 'NoneType' objects\n
02-11-2020 12:03:21.152 +0000 ERROR AdminManagerExternal - Unexpected error "" from python handler: "cannot concatenate 'str' and 'NoneType' objects".  See splunkd.log for more details.
02-11-2020 12:03:21.169 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.169 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.169 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.169 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.170 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.170 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.170 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.170 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.178 +0000 ERROR AdminManagerExternal - Stack trace from python handler:\nTraceback (most recent call last):\n  File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 111, in init_persistent\n    hand.execute(info)\n  File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 634, in execute\n    if self.requestedAction == ACTION_LIST:     self.handleList(confInfo)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunk_ta_mscs_rh_azureaccount.py", line 132, in handleList\n    AdminExternalHandler.handleList(self, confInfo)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/admin_external.py", line 51, in wrapper\n    for entity in result:\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 123, in wrapper\n    raise RestError(500, traceback.format_exc())\nRestError: REST Error [500]: Internal Server Error -- Traceback (most recent call last):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 116, in wrapper\n    for name, data, acl in meth(self, *args, **kwargs):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 353, in _format_all_response\n    self._encrypt_raw_credentials(cont['entry'])\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 387, in _encrypt_raw_credentials\n    change_list = rest_credentials.decrypt_all(data)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/credentials.py", line 289, in decrypt_all\n    all_passwords = credential_manager._get_all_passwords()\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/utils.py", line 159, in wrapper\n    return func(*args, **kwargs)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/credentials.py", line 272, in _get_all_passwords\n    clear_password += field_clear[index]\nTypeError: cannot concatenate 'str' and 'NoneType' objects\n\n
02-11-2020 12:03:21.178 +0000 ERROR AdminManagerExternal - Unexpected error "" from python handler: "REST Error [500]: Internal Server Error -- Traceback (most recent call last):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 116, in wrapper\n    for name, data, acl in meth(self, *args, **kwargs):\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 353, in _format_all_response\n    self._encrypt_raw_credentials(cont['entry'])\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/handler.py", line 387, in _encrypt_raw_credentials\n    change_list = rest_credentials.decrypt_all(data)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/splunktaucclib/rest_handler/credentials.py", line 289, in decrypt_all\n    all_passwords = credential_manager._get_all_passwords()\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/utils.py", line 159, in wrapper\n    return func(*args, **kwargs)\n  File "/opt/splunk/etc/apps/Splunk_TA_microsoft-cloudservices/bin/splunktamscs/solnlib/credentials.py", line 272, in _get_all_passwords\n    clear_password += field_clear[index]\nTypeError: cannot concatenate 'str' and 'NoneType' objects\n".  See splunkd.log for more details.
02-11-2020 12:03:21.246 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.246 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.247 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.247 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.247 +0000 ERROR AesGcm - error:00000000:lib(0):func(0):reason(0)
02-11-2020 12:03:21.247 +0000 ERROR AesGcm - AES-GCM Decryption failed!
02-11-2020 12:03:21.247 +0000 ERROR Crypto - Decryption operation failed: AES-GCM Decryption failed!
02-11-2020 12:03:21.247 +0000 WARN  ConfigEncryptor - Decryption operation failed: AES-GCM Decryption failed!

The first thing I noticed is that my new TA was not the only one having issues - it seems that the Splunk-provided Azure TA was also having issues. This made the problem significantly more interesting. If multiple, unrelated applications are having the exact same issue when attempting to access the same resource, the problem is likely not within the application - it's probably within the resource. After an hour or two spent Googling, I determined that there was no easy solution available online.

Time for some reverse engineering

I started by taking a peek at the Splunk admin REST API handler. This component, written in Python, is one that a Splunk developer can use to create a custom Extensible Admin Interface (EAI) handler. You can find it in any old Splunk installation at $SplunkHome/lib/python$version/site-packages/splunk/admin.py. I simply opened it in VS Code and began reading.

This proved to be a bad place to start. The MConfigHandler class provides boilerplate code for quickly getting a new REST handler up and running, but it abstracts quite a lot and hides even more in a sea of PEP-8 violations and other bad practices. While sussing out how the handler works, I quickly became frustrated and sought to increase the speed of my search. I realized the unique error message "AES-GCM Decryption Failed!" would likely be my best place to start. Thus began the grepping.

It took an hour or two of fruitlessly limiting my search to Python files before I mistyped a grep command and accidentally searched the whole folder. You can see the result below.

bash-4.2$ grep -Rin "Decryption failed" .
Binary file ./bin/splunkd matches
Binary file ./bin/mongod matches
Binary file ./bin/mongod-3.4 matches
Binary file ./bin/mongod_cc matches
Binary file ./lib/libssl.so matches
Binary file ./lib/libssl.so.1.0.0 matches
^C
bash-4.2$ grep -Rin "AES-GCM Decryption failed" .
Binary file ./bin/splunkd matches

Well, of course the error message originates within the splunkd binary itself! These things can never be easy. Not one to be deterred, I decided to take a peek into the mysterious black box that is splunkd.

Hello Ghidra, my old friend

Ghidra is an amazing utility. It's a reverse engineering tool graciously gifted to the community by our friends at the NSA. You can learn more about it at the Ghidra website. Don't worry, they swear there are no back doors in it.

While I primarily use Ghidra for exploit development, it seemed like a perfect tool to help solve this problem, so I dropped a copy of Splunk into Ghidra and tried to decompile it. One of the many neat features of Ghidra is it's not just a disassembler, it's a full decompiler. This means that it takes the bytecode from an input binary, converts it into assembly, and then tries to convert that assembly into a higher level language such as C. It does an amazing job at this.... provided you take the time to give it the resources it needs. You can see the result of my first-pass decompilation attempt below.

Skipping empty section[.tm_clone_table]
Unsupported Thread-Local Symbol not loaded: _ZSt15__once_callable
Unsupported Thread-Local Symbol not loaded: _ZSt11__once_call
Unsupported Thread-Local Symbol not loaded: _ZN6google8protobuf8internal9ArenaImpl13thread_cache_E
Elf Relocation Error: Type = 18 (0x12) at 0346dcb0, Symbol = _ZSt15__once_callable: TLS symbol relocation not yet supported
Elf Relocation Error: Type = 18 (0x12) at 0346dcb8, Symbol = _ZSt11__once_call: TLS symbol relocation not yet supported
Elf Relocation Error: Type = 18 (0x12) at 0346e088, Symbol = _ZN6google8protobuf8internal9ArenaImpl13thread_cache_E: TLS symbol relocation not yet supported
  [libjemalloc.so.2] -> not found
  [librt.so.1] -> not found
  [libpcre2-8.so] -> not found
  [libxml2.so.2] -> not found
  [libxslt.so.1] -> not found
  [libssl.so.1.0.0] -> not found
  [libxmlsec1.so.1] -> not found
  [libxmlsec1-openssl.so.1] -> not found
  [libcrypto.so.1.0.0] -> not found
  [libdl.so.2] -> not found
  [libarchive.so.13] -> not found
  [libbz2.so.1] -> not found
  [libsqlite3.so.0] -> not found
  [libz.so.1] -> not found
  [libmongoc-1.0.so.0] -> not found
  [libbson-1.0.so.0] -> not found
  [libm.so.6] -> not found
  [libpthread.so.0] -> not found
  [libc.so.6] -> not found
----- [splunkd] Resolve 858 external symbols -----
Unresolved external symbols which remain: 858

Ghidra wasn't able to fully decompile the binary this run, as it wasn't able to resolve all of the external symbols within it. This is easy enough to fix, as Ghidra also provides a list of the libraries containing the missing external symbols. Turns out, you can find most of these libraries in the $SplunkHome/lib folder, provided you remember to resolve static links. I had to grab the following libraries from the affected system before I was able to fully decompile splunkd:

  • libc.so.6
  • libdl.so.2
  • libm.so.6
  • libpthread.so.0
  • librt.so.1
  • libxml2.so.2

Once I had those libraries loaded into Ghidra, I re-added the splunkd binary and was greeted with the message: Unresolved external symbols which remain: 0. We're off to a good start.

Now Ghidra doesn't exactly have the ability to "CTRL+F" for a string, but it can get close. I knew I needed to find the string "AES-GCM Decryption Failed" to search for references within the application. So I kicked off the string finder (Found under Search > Strings), set the minimum length of 20, and let it run. This will generate a list of all strings >20 characters within the application. From there, it's possible to apply a filter. Searching for the filter "AES" narrowed down the result enough for me to find what I was looking for.

found-string-aes-gcm-decryption-failed.png

Bingo. Double clicking on that took me to the exact spot in the binary where that string was located. Additionally, I was able to see that there's only a single reference to that location within the splunkd binary.

0x02aaf649.png

And double-clicking on that reference took me to the offending function within splunkd. You can see the decompiled source code below.

/* AesGcmKey::decrypt(Str*, Str const&, Str const&, unsigned long, unsigned long) const */

void __thiscall decrypt(AesGcmKey *this,Str *param_1,Str *param_2,Str *evpAAD,ulong param_4,ulong param_5)

{
  uchar *iv;
  char cVar1;
  int evpSuccessful;
  int iVar2;
  uchar *ptr;
  EVP_CIPHER *cipher;
  EncryptionAlgorithmException *this_00;
  Exception *this_01;
  undefined8 *this_02;
  ulong uVar3;
  char **plaintext;
  long lVar4;
  int local_320;
  int evpOutputLength;
  EVP_CIPHER_CTX *evpCipherContext [2];
  uchar *local_2d8;
  ulong local_2d0;
  uchar local_2c8 [16];
  undefined *local_2b8 [2];
  undefined auStack680 [16];
  undefined *local_298 [2];
  undefined auStack648 [16];
  undefined *local_278 [2];
  undefined auStack616 [16];
  char *local_258;
  ulong local_250;
  char acStack584 [496];
  char **local_58;
  ulong local_50;
  long local_40;
  uchar *ciphertext;
  
  local_40 = __stack_chk_guard;
  if (*this == (AesGcmKey)0x0) {
                    /* WARNING: Subroutine does not return */
    __assert_fail("_initialized","/opt/splunk/src/framework/auth/AesGcm.cpp",0xcd,"void AesGcmKey::decrypt(Str*, const Str&, const Str&, size_t, size_t) const");
  }
  if (0x2fffff < *(ulong *)(param_2 + 8)) {
    this_02 = (undefined8 *)__cxa_allocate_exception(0x28,0);
                    /* try { // try from 0181dabd to 0181dac1 has its CatchHandler @ 0181dbaa */
    FUN_0181d530(&local_258,"Given ciphertext exceeds max supported size!");
    this_02[2] = 0;
    *(undefined *)(this_02 + 3) = 0;
    *this_02 = 0x34c20b8;
    *(undefined8 **)(this_02 + 1) = this_02 + 3;
    _set_str((BaseException *)this_02,local_258,local_250);
    *this_02 = 0x34b6b08;
    if (local_258 != acStack584) {
      _ZdlPv(local_258);
    }
                    /* WARNING: Subroutine does not return */
    __cxa_throw(this_02,&typeinfo,~Exception);
  }
  local_2d0 = 0;
  local_2c8[0] = '\0';
  local_2d8 = local_2c8;
                    /* try { // try from 0181d668 to 0181d6c0 has its CatchHandler @ 0181da30 */
  cVar1 = decode((Str *)&local_2d8,*(void **)param_2,*(ulong *)(param_2 + 8));
  iv = local_2d8;
  if (cVar1 == '\0') {
    this_01 = (Exception *)__cxa_allocate_exception(0x28);
                    /* try { // try from 0181da54 to 0181da58 has its CatchHandler @ 0181db8d */
    FUN_0181d530(local_2b8,"Could not decode ciphertext!");
                    /* try { // try from 0181da70 to 0181da74 has its CatchHandler @ 0181db6a */
    Exception(this_01,(Logger *)&DAT_035729c0,(Str *)local_2b8,errorSeg);
    if (local_2b8[0] != auStack680) {
      _ZdlPv(local_2b8[0]);
    }
                    /* WARNING: Subroutine does not return */
                    /* try { // try from 0181da99 to 0181da9d has its CatchHandler @ 0181da30 */
    __cxa_throw(this_01,&typeinfo,~Exception);
  }
  if (local_2d0 < param_5 + param_4) {
    this_01 = (Exception *)__cxa_allocate_exception(0x28);
                    /* try { // try from 0181d9e3 to 0181d9e7 has its CatchHandler @ 0181dba8 */
    FUN_0181d530(local_298,"Decoded ciphertext has malformed size!");
                    /* try { // try from 0181d9ff to 0181da03 has its CatchHandler @ 0181db92 */
    Exception(this_01,(Logger *)&DAT_035729c0,(Str *)local_298,errorSeg);
    if (local_298[0] != auStack648) {
      _ZdlPv(local_298[0]);
    }
                    /* WARNING: Subroutine does not return */
                    /* try { // try from 0181da2b to 0181da2f has its CatchHandler @ 0181da30 */
    __cxa_throw(this_01,&typeinfo,~Exception);
  }
  uVar3 = (local_2d0 - param_4) - param_5;
  ciphertext = local_2d8 + param_4;
  ptr = local_2d8 + param_4 + uVar3;
  EvpCryptContext((EvpCryptContext *)evpCipherContext);
                    /* try { // try from 0181d6c1 to 0181d6f5 has its CatchHandler @ 0181d958 */
  cipher = EVP_aes_256_gcm();
  evpSuccessful = EVP_DecryptInit_ex(evpCipherContext[0],cipher,(ENGINE *)0x0,(uchar *)0x0,(uchar *)0x0);
  if (evpSuccessful == 1) {
    iVar2 = EVP_CIPHER_CTX_ctrl(evpCipherContext[0],9,(int)param_4,(void *)0x0);
    if (iVar2 == 1) {
      if (uVar3 < 0x201) {
        plaintext = &local_258;
        local_50 = 0x200;
      }
      else {
                    /* try { // try from 0181d93b to 0181d93f has its CatchHandler @ 0181dbbd */
        plaintext = (char **)operator.new[](uVar3);
        local_50 = uVar3;
      }
      local_58 = plaintext;
      iVar2 = EVP_DecryptInit_ex(evpCipherContext[0],(EVP_CIPHER *)0x0,(ENGINE *)0x0,(uchar *)(this + 1),iv);
      if (iVar2 == 1) {
        if (*(long *)(evpAAD + 8) != 0) {
          iVar2 = EVP_DecryptUpdate(evpCipherContext[0],(uchar *)0x0,&evpOutputLength,*(uchar **)evpAAD,(int)*(long *)(evpAAD + 8));
          if (iVar2 != 1) goto LAB_0181d880;
        }
        iVar2 = EVP_DecryptUpdate(evpCipherContext[0],(uchar *)plaintext,&evpOutputLength,ciphertext,(int)uVar3);
        if (0 < iVar2) {
          local_320 = (int)param_5;
          lVar4 = (long)evpOutputLength;
          iVar2 = EVP_CIPHER_CTX_ctrl(evpCipherContext[0],0x11,local_320,ptr);
          if (iVar2 == 1) {
            iVar2 = EVP_DecryptFinal_ex(evpCipherContext[0],(uchar *)((long)evpOutputLength + (long)plaintext),&evpOutputLength);
            if (0 < iVar2) {
              if (uVar3 != lVar4 + (long)evpOutputLength) {
                    /* WARNING: Subroutine does not return */
                __assert_fail("len == ciphertext_len","/opt/splunk/src/framework/auth/AesGcm.cpp",0xe6,"void AesGcmKey::decrypt(Str*, const Str&, const Str&, size_t, size_t) const");
              }
                    /* try { // try from 0181d7e5 to 0181d7e9 has its CatchHandler @ 0181d981 */
              _M_replace((basic_string,std--allocator> *)param_1,0,*(ulong *)(param_1 + 8),(char *)local_58,uVar3);
              if ((local_58 != &local_258) && (local_58 != (char **)0x0)) {
                _ZdaPv(local_58);
              }
              ~EvpCryptContext((EvpCryptContext *)evpCipherContext);
              if (local_2d8 != local_2c8) {
                _ZdlPv(local_2d8);
              }
              if (local_40 == __stack_chk_guard) {
                return;
              }
                    /* WARNING: Subroutine does not return */
              __stack_chk_fail();
            }
            FUN_0181d490();
          }
        }
      }
LAB_0181d880:
      this_01 = (Exception *)__cxa_allocate_exception(0x28);
                    /* try { // try from 0181d89f to 0181d8a3 has its CatchHandler @ 0181db5a */
      FUN_0181d530(local_278,"AES-GCM Decryption failed!");
                    /* try { // try from 0181d8bb to 0181d8bf has its CatchHandler @ 0181dbc2 */
      Exception(this_01,(Logger *)&DAT_035729c0,(Str *)local_278,errorSeg);
      if (local_278[0] != auStack616) {
        _ZdlPv(local_278[0]);
      }
                    /* WARNING: Subroutine does not return */
                    /* try { // try from 0181d8e7 to 0181d8eb has its CatchHandler @ 0181d981 */
      __cxa_throw(this_01,&typeinfo,~Exception);
    }
  }
  this_00 = (EncryptionAlgorithmException *)__cxa_allocate_exception(0x28);
                    /* try { // try from 0181d907 to 0181d90b has its CatchHandler @ 0181dbe3 */
  EncryptionAlgorithmException(this_00,"OpenSSL EVP Cipher Decrypt Context cannot be initialized!");
                    /* WARNING: Subroutine does not return */
                    /* try { // try from 0181d91d to 0181d921 has its CatchHandler @ 0181d958 */
  __cxa_throw(this_00,&typeinfo,~EncryptionAlgorithmException);
}


Now, there's a lot going on in that function, so I'll highlight our two areas of interest. LAB_0181d880 is an error handling subroutine that contains the string I was looking for. This subroutine is referenced in exactly one place in this function, shown below.

      iVar2 = EVP_DecryptInit_ex(evpCipherContext[0],(EVP_CIPHER *)0x0,(ENGINE *)0x0,(uchar *)(this + 1),iv);
      if (iVar2 == 1) {
        if (*(long *)(evpAAD + 8) != 0) {
          iVar2 = EVP_DecryptUpdate(evpCipherContext[0],(uchar *)0x0,&evpOutputLength,*(uchar **)evpAAD,(int)*(long *)(evpAAD + 8));
          if (iVar2 != 1) goto LAB_0181d880;
        }
        iVar2 = EVP_DecryptUpdate(evpCipherContext[0],(uchar *)plaintext,&evpOutputLength,ciphertext,(int)uVar3);

LAB_0181d880 is only called if iVar2 is not equal to 1 following a call to EVP_DecryptUpdate. Great. So. What's that mean?

A bit of Googling later, I found that EVP_DecryptUpdate is a member of the OpenSSL binary that takes 5 arguments. In order, these arguments are:

  1. The cipher context (EVP_CRYPT_CTX)
  2. A pointer to a character array in which to output the data
  3. An integer denoting the expected output size
  4. A pointer to a character array containing the input data
  5. An integer denoting the input data size

The thing that caught my eye is that a null pointer was being passed to this particular call to EVP_DecryptUpdate. After consulting the documentation, I learned that EVP_DecryptUpdate should be called in this manner if GCM is being utilized and you want to validate the Additional Authenticated Data (AAD) associated with the ciphertext. EVP_DecryptUpdate will return 1 if the AAD was successfully validated.

From the decompiled code and the documentation, I learned that this error message can only be generated if AAD validation fails. From past experience, I know that AAD validation will fail if any of the following are true:

  1. The ciphertext was modified
  2. The AAD was modified
  3. The key is invalid for the given ciphertext and AAD

Now we have 3 possible causes for this issue and no potential solution. I had no way of knowing which of those items was causing the AAD validation failure, and no way to return any of them to their previous state if they had been modified. Time to chalk this installation up as a loss, reimage, and start from scratch. I'd have to reconfigure everything, but at least my problems would be solved.

Reimaging - A not-so-final solution

I conveyed my findings this far to Splunk support, and was asked to preserve that heavy forwarder instead of reimaging it. So I created a new CentOS instance, installed a brand spanking new copy of Splunk 8.0.2, performed my initial configuration, and deployed the TA I'd been working on. Wouldn't you know it, the problem came back.

At this point, I was at a loss for words. I could only assume something was terribly wrong with the Splunk "storage/passwords" REST endpoint, and couldn't begin to fathom how something like bug ever made it past QA. I started looking into the way other TAs handle their password storage (spoiler: a surprising amount of them just use the plain old KV store endpoint), and was unsatisfied. So I reluctantly started working on my own password storage solution to use with Splunk.

Could it be so simple..?

The next day, as I was drinking my morning coffee, an idea hit me. I hadn't been very careful when I was packaging my TA. Maybe, just maybe, I left some configuration files inside the archive. Maybe even the passwords.conf file. If that were the case, the Splunk configuration loader would be attempting to decrypt the password using the wrong key. This would absolutely cause AAD validation to fail, and would cause that exact error scenario to resurface!

With newfound energy, I logged into my new heavy forwarder, navigated to the local folder within the application folder, and found it! A passwords.conf file. I removed the erroneous file, updated my inputs.conf, crossed my fingers, and restarted Splunk.

Lo and behold, there were no errors! The TA was now working exactly as expected. I was able to recreate the configuration file through the web GUI and start ingesting data.

Issue Summary

Now that I've had some time to think on it, I can say with a high degree of confidence that the following series of events caused this obscure error to surface:

  1. A botched upgrade attempt from 7.2.5.1 to 8.0.1 caused the encryption key to be changed or updated with Splunk
  2. Splunk's properly implemented crypto code resulted in the ciphertext being rejected instead of decrypted when AAD validation failed
  3. Splunklib is not equipped to handle that particular failure scenario. Instead of generating a coherent error message when the password storage endpoint returned a NoneType, it imploded
  4. The mistake I made when packaging my new TA not only caused me a ton of headache, it also ended up helping me find the solution to this error

Something good came out of this experience, at least. Now, the next time someone sees the error message "AES-GCM Decryption Failed" within Splunk, Google will be able to help them find a solution.

Return to blog