Q&A: Timo Steffens | Decipher
Timo Steffens, private security researcher and author of Attribution of Advanced Persistent Threats, discusses some of the major hurdles researchers face in attribution.
Lindsey O’Donnell-Welch: What are the biggest current challenges impacting the attribution process?
Timo Steffens: There are three big challenges: data, group definitions and generic attack techniques. [For data,] Strong attribution requires a variety of analytical skills, ranging from malware analysis to forensic log analysis, infrastructure tracking, language skills, and geopolitical analysis. However, most of these skills are useless if you don’t have access to substantial and large amounts of data. The amount of data is important for two reasons: first, the more data you have about attacker activity, the higher the likelihood that attackers were wrong about the activity covered by the data. And second, you can also perform attribution without attackers making mistakes, but then you need enough data to identify patterns. A simple example of these patterns is the timestamp of attacker activity. Two timestamps alone don’t form a pattern, but if you have a lot of timestamps, you might be able to identify a timezone that points to the origin or at least the attackers workplace (of course, a timezone doesn’t will be only a small piece of the overall puzzle that attribution analysis must solve). Additionally, much of the data that can help with attribution is difficult to access, due to privacy, GDPR, or EULAs. For example, as an academic researcher, you might find it difficult to obtain detailed data (like internal log files) from a company that has been the victim of a cyberattack. Additionally, it is difficult to access a copy of a control server used by attackers. Usually, only law enforcement agencies have the mandate to request this data.
[For group definitons,] the most public and political attention is given to the last stage of attribution, which is the identification of the country of origin. But it is important to remember that the allocation consists of several phases. This is what I call the “4C model”: Collect, Aggregate, Charge and Communicate. First you need to collect data of different types from different sources. Next, group attacks that are technically and strategically similar, assuming they were carried out by the same actors. This step is called “clustering” because it places all data points and artifacts that might belong to the same culprits in a fancy bucket like “APT28” or “PutterPanda” etc. Then you can work on all that data to “load” and try to identify the country, organization, agency, or even individuals behind the attacks. And finally, you can decide if and how you want to communicate your results. The point here is this: the real magic happens in the grouping phase. This is really the hardest part to understand. If you carefully examine threat reports from different security companies, you will notice that they generally do not disagree on the likely country of origin. But they all strongly disagree on the exact definition of a group: does this control server belong to group A or group B? Is this malware family used exclusively by one group (so that all attacks with this malware family can be attributed to the same group), or is the malware shared? More so, this problem of group definitions (i.e. grouping) becomes more difficult over time, because groups are not static and monolithic. Malware developers and operators can change jobs and affiliations, get promoted and transferred to another team, decide to start their own business and work as an entrepreneur. In all of these cases, they are likely to take with them tools, source code, or just a few ideas and habits, which will later lead security analysts to attribute them to their old group of attackers. Threat Intelligence is full of attack groups believed to have been active for a decade or more. It is difficult to decide when these group definitions should be removed and redefined.
[For generic attack techniques,] clustering (tying an attack to a group definition) is straightforward if attackers use their own malware or idiosyncratic techniques. Unfortunately, many attackers decide to use tools and techniques that are freely available and therefore not specific to certain groups. Examples of this are freely available penetration testing frameworks like Empire or Mimikatz or even copies of Cobalt Strike.