Saturday, July 2, 2022

Staying logged into a web site when using puppeteer

Do you want to stay logged into a site after manually logging in with puppeteer?

This task is not as easy as just specifying the user data directory based on feedback from the following Github thread.

https://github.com/puppeteer/puppeteer/issues/921

The work around is to save cookies after logging in and then load those cookies in subsequent sessions. Here is the answer from Github.

https://stackoverflow.com/questions/56514877/how-to-save-cookies-and-load-it-in-another-puppeteer-session#56515357


Saturday, June 25, 2022

What is trust in cyber security?

What is trust? The philosophy of trust is fascinating. The definition of trust is "Assured reliance on the character, ability, strength, or truth of someone or something".

Where does the trustor's belief come from? My point of view is that the trustor must have a past experience or foundational belief on behalf of the trustor to inform the level of trust or distrust the trustor will have in the trustee. Without some prior knowledge to an interaction, there is likely to be no foundation for trust or distrust. Every adult human on the face of the earth will have a weighted perspective on how much they believe a trustee will act in a way that is beneficial to the trustor.

So far this post has been a very philosophical discussion. Why am I even talking about the concept of trust? In information security, trust is the bedrock for nearly interaction (or at least it should). When a person attempts to redeem a gift card online, the web application owner assumes that if valid card details are entered, then that must be the owner of the gift card. The online shop will then add the balance of the gift card to the person's account. It would be amazing if the world was this simple, but online fraudsters will take advantage of this inherent trust. Fraudsters can brute force gift card validation endpoints using automation (aka bots) to redeem balances on gift cards.

How can a website owner distinguish between legitimate users and fraudsters for these cases?

There are a number of identifiers that are available with a given request including:

  • Source of the traffic. (IP and Geo IP)
  • Owner of the source traffic. IP ASN Owner and OrgName.
  • HTTP request headers
  • Rate of traffic

This is not an exhaustive list and there are many sub categories of these different fields that could be utilized as well. This is especially true with rate of traffic. If there are a million requests being sent within a 5 minute time frame from a single user, then that would likely be considered abusive or fraudulent by most web applications. These 5 million requests would not be considered trustworthy since it vastly exceeds the normal user usage. However, there are applications where this may be acceptable behavior. What qualifies as "normal" requires some prior knowledge or definition to define what "normal" really is.

We cannot infer trust in a vacuum. We must rely on prior knowledge to guide if something or someone is trustworthy. As we continue to progress in cyber security to fight fraud, it will be interesting to see how an individual's history is recorded for good or bad behavior.

Here are some questions that I have for maybe a later past.

If my online persona violates the ToS for a site, then does that get recorded somewhere? Should it? When should it be a requirement for my real or true identity to be used for interacting with a site instead of having the ability to use a persona?

Monday, December 20, 2021

Risk of incorrectly classifying people as bots

I really like two different shows. The Netflix show Black Mirror illustrates various dystopian futures and Reply All is usually an upbeat show that reports on a wide spectrum of topics.

Recently Reply All put out the show State of Panic where the critics of a Florida politician were the recipients of varying degrees of unwanted attention to put it lightly. The large volume of unwanted attention came in the form of undesirable direct messages (DMs) and tweets directed at the critic. The large volume of messages had very focused messaging which could give the impression that the communication was being driven by a small set of individuals with a large number of bot operator managed Twitter accounts. However, after doing some investigation by the good folks at Reply All, it was identified that there were actually a number of very zealous Twitter followers of the politician. The Florida politician, while having some controversial beliefs, had deeply connected with many people online.

Below is the short synopsis the Men on Fire episode of Black Mirror from Wikipedia.

The episode follows Stripe (Malachi Kirby), a soldier who hunts humanoid mutants known as roaches. After a malfunctioning of his MASS, a neural implant, he discovers that these "roaches" are ordinary human beings. In a fateful confrontation with the psychologist Arquette (Michael Kelly), Stripe learns that the MASS alters his perception of reality.

For the soldier, it is much easier to eliminate the "roaches" than to eliminate real people. Once the soldier becomes aware that his actions are impacting people instead of "roaches", he starts to empathize and question his overall mission. In the case for Reply All's State of Panic episode, recipient of the harassment viewed the messages as originating from bot managed accounts since it was difficult to believe that so many real people would have the specific set of beliefs. While harassment is certainly not helpful, we still have to acknowledge that these are real people's voices. When we incorrectly classify public web discourse as bot traffic, we are minimizing the view points of those individuals. The view points may be misguided or factually incorrect, but they still are the perspectives of those people. 

Large numbers of fake accounts that are managed by a small set of individuals can absolutely be a problem. Platforms should take steps to reduce the influence of fake accounts where possible. However, we should also avoid the knee jerk reaction of classifying controversial opinions as originating from "bots". Otherwise, we will not understanding the view points of a large population of individuals.

Monday, August 30, 2021

E-commerce Bot Economics

 

What does supply and demand have to do with bots?

For this post, I am not talking about bots that are performing attacks such as SQLi or Account Takeover (ATO). This post will strictly explore the grey area of web scrapping and cart checkout based bots. For folks that have studied economics, you are likely familiar with tried and true supply and demand curve. A quick refresher can be found on wikipedia, https://en.wikipedia.org/wiki/Supply_and_demand

Xbox and PS5 bots

There are a number of tools that exist for going through a checkout process to buy products. The bot operators typically fall into a few different buckets:

The motivation for an organization to buy thousands or even hundreds of thousands of PS5s is because... drumroll.... MONEY! There is an enormous opportunity to buy low and sell high for this highly sought after product.

Into the economics

There is a substantial delta between what customers are willing to pay for a product and what the product is initial sold for. In a free marketplace, the high demand would be signal to charge more to customers for the product. Charging a higher sticker price allows demand to go down to meet supply and ultimately result in higher profits for a business. More importantly, this removes the profit margin that can exists in secondary markets. A bot management solution can be a helpful tool for raising the barrier to entry for a bot operator. However, if there is enough of a margin between the price of the product that the primary seller and the price that the end consumer is willing to pay, then bot operators will find a way around any bot management solution. The ultimate solution is to close the gap between the primary market and any secondary or end consumer tooling markets.
Primary market sellers cannot always raise prices

There are a number of marketplaces where the primary sellers cannot raise prices. For example, with ticketing platforms the ticket prices are often set by the artists that will be performing. This results in many artists that will set ticket prices substantially below market value with the hope that "real fans" will be able to buy the tickets. When scalpers purchase the tickets and then sell them at the actual market value, then the fans get frustrated with the ticket selling vendor. The bot operators then resell the tickets at a market rate on a secondary market with a healthy profit. The artists are then able to deflect any criticism about ticket prices onto the ticket selling vendor for not implementing sufficient ticket scalping measures.

What is the solution?

The real solution to this problem is an economics solution. There are 2 options. Either increase supply or reduce demand. To continue with the venue ticketing example, increasing supply is a simple concept. If artists do substantially more shows so that more seats are available, then supply goes up. This is a simple concept, but not easy or desirable to implement since it requires much more work for the artists and supporting staff for less of a return. This is the approach that Kid Rock took. I would encourage everyone to listen to this podcast for details on that use-case, https://www.npr.org/transcripts/671583061. The other approach is to reduce demand, which is also an easy concept. Taylor Swift did by raising her ticket prices. The following link contains those details, https://econlife.com/2019/12/higher-concert-ticket-prices/.

These solutions are great if the seller has the ability and will to make these changes, but often the seller has a deficit in one of these areas. As a consequence, sellers are forced to examine how to increase the barriers to entry to make it less profitable for secondary markets to operate. An effective bot management solution and identity management solution can be a critical piece in increasing the costs for secondary markets. However, if the gap between the price customers are willing to pay and the available supply remains wide, then secondary markets will remain profitable.
Where can this mindset be applied?

I would argue that the true customer for companies that sell products at below market value is the supplier. The initial sellers need the ability to charge more for these high demand products, or they will be forced to add more friction in the buying process to try to combat the secondary market. This economics approach may be considered for really any highly sought after product. This includes physical products like shoes (sneaker bots), baby cloths, GPUs, airline tickets, and PS5. 

Bot Operator Tooling

The tooling that is used to buy the product from the primary marketplace can vary from basic scripts using items like the following:

  •     Browser plug-ins
  •     Python Requests and Beautiful Soup
  •     Selenium
  •     Anti-captcha solutions to defeat captchas
  •     Residential and mobile network proxies


Appendix