C4LEB's Blogs

A win for privacy

C4LEB Blog Last Activity 3 weeks ago 426 views 7 comments

... or, Why you shouldn't post personal details on GBT (or anywhere)

Comments

You must be logged in to post comments, please login or signup (free)

C4LEB

3 weeks ago

Reasons for not putting personal details on GBT

There are people out there who want your data and there are courses out there on how to do it

What Is Web Scraping? A Complete Beginner’s GuideAs the digital economy expands, the role of web scraping becomes ever more important. Read on to learn what web scraping is, how it works, and why it’s so important for data analytics.

The amount of data in our lives is growing exponentially. With this surge, data analytics has become a hugely important part of the way organizations are run. And while data has many sources, its biggest repository is on the web. As the fields of big data analytics, artificial intelligence, and machine learning grow, companies need data analysts who can scrape the web in increasingly sophisticated ways.

This beginner’s guide offers a total introduction to web scraping, what it is, how it’s used, and what the process involves. We’ll cover:

Before we get into the details, though, let’s start with the simple stuff…

1. What is web scraping?

Web scraping (or data scraping) is a technique used to collect content and data from the internet. This data is usually saved in a local file so that it can be manipulated and analyzed as needed. If you’ve ever copied and pasted content from a website into an Excel spreadsheet, this is essentially what web scraping is, but on a very small scale.

However, when people refer to ‘web scrapers,’ they’re usually talking about software applications. Web scraping applications (or ‘bots’) are programmed to visit websites, grab the relevant pages and extract useful information. By automating this process, these bots can extract huge amounts of data in a very short time. This has obvious benefits in the digital age, when big data—which is constantly updating and changing—plays such a prominent role. You can learn more about the nature of big data in this post.What kinds of data can you scrape from the web?

If there’s data on a website, then in theory, it’s scrapable! Common data types organizations collect include images, videos, text, product information, customer sentiments and reviews (on sites like Twitter, Yell, or Tripadvisor), and pricing from comparison websites. There are some legal rules about what types of information you can scrape, but we’ll cover these later on.

2. What is web scraping used for?Web scraping has countless applications, especially within the field of data analytics. Market research companies use scrapers to pull data from social media or online forums for things like customer sentiment analysis. Others scrape data from product sites like Amazon or eBay to support competitor analysis.

Meanwhile, Google regularly uses web scraping to analyze, rank, and index their content. Web scraping also allows them to extract information from third-party websites before redirecting it to their own (for instance, they scrape e-commerce sites to populate Google Shopping).

Many companies also carry out contact scraping, which is when they scrape the web for contact information to be used for marketing purposes. If you’ve ever granted a company access to your contacts in exchange for using their services, then you’ve given them permission to do just this.

There are few restrictions on how web scraping can be used. It’s essentially down to how creative you are and what your end goal is. From real estate listings, to weather data, to carrying out SEO audits, the list is pretty much endless!

However, it should be noted that web scraping also has a dark underbelly. Bad players often scrape data like bank details or other personal information to conduct fraud, scams, intellectual property theft, and extortion. It’s good to be aware of these dangers before starting your own web scraping journey. Make sure you keep abreast of the legal rules around web scraping.

3. How does a web scraper function?So, we now know what web scraping is, and why different organizations use it. But how does a web scraper work? While the exact method differs depending on the software or tools you’re using, all web scraping bots follow three basic principles:

Step 1: Making an HTTP request to a server Step 2: Extracting and parsing (or breaking down) the website’s code Step 3: Saving the relevant data locally

• https://careerfoundry.com/en/blog/data-analytics/web-scraping-guide/

C4LEB

3 weeks ago

How to poison the data that Big Tech uses to surveil you

Algorithms are meaningless without good data. The public can exploit that to demand change.

Every day, your life leaves a trail of digital breadcrumbs that tech giants use to track you. You send an email, order some food, stream a show. They get back valuable packets of data to build up their understanding of your preferences. That data is fed into machine-learning algorithms to target you with ads and recommendations. Google cashes your data in for over $120 billion a year of ad revenue.

Increasingly, we can no longer opt out of this arrangement.

[T]hree ways the public can exploit this to their advantage:

□ Data strikes, inspired by the idea of labor strikes, which involve withholding or deleting your data so a tech firm cannot use it—leaving a platform or installing privacy tools, for instance.

□ Data poisoning, which involves contributing meaningless or harmful data. AdNauseam, for example, is a browser extension that clicks on every single ad served to you, thus confusing Google’s ad-targeting algorithms.

□ Conscious data contribution, which involves giving meaningful data to the competitor of a platform you want to protest, such as by uploading your Facebook photos to Tumblr instead.

People already use many of these tactics to protect their own privacy. If you’ve ever used an ad blocker or another browser extension that modifies your search results to exclude certain websites, you’ve engaged in data striking and reclaimed some agency over the use of your data. But [...] sporadic individual actions like these don’t do much to get tech giants to change their behaviors.

What if millions of people were to coordinate to poison a tech giant’s data well, though? That might just give them some leverage to assert their demands.

Just this week, Google also announced that it would stop tracking individuals across the web and targeting ads at them. While it’s unclear whether this is a real change or just a rebranding [...] it’s possible that the increased use of tools like AdNauseam contributed to that decision by degrading the effectiveness of the company’s algorithms. (Of course, it’s ultimately hard to tell. “The only person who really knows how effectively a data leverage movement impacted a system is the tech company,” [...])

• https://www.technologyreview.com/2021/03/05/1020376/resist-big-tech-surveillance-data/

AdNauseam for Android Firefox: https://addons.mozilla.org/en-US/firefox/addon/adnauseam/

TrackMeNot for desktop Firefox: https://addons.mozilla.org/en-US/firefox/addon/trackmenot/

C4LEB

4 weeks ago

Privacy in the digital age: comparing and contrasting individual versus social approaches towards privacy

Part of the open access paper published in Ethics and Information Technology
Becker, M. Privacy in the digital age: comparing and contrasting individual versus social approaches towards privacy. Ethics Inf Technol 21, 307–317 (2019). https://doi.org/10.1007/s10676-019-09508-z via https://link.springer.com/article/10.1007/s10676-019-09508-z

The importance of privacy: autonomy

The history of justifications of privacy starts with Warren and Brandeis’s (1890) legal definition of privacy as the right to be left alone (1890). This classic definition is completely in line with the literal meaning of privacy. The word is a negativum (related to deprive) of public. The right to privacy is essentially the right of individuals to have their own domain, separated from the public (Solove 2015). The basic way to describe this right to be left alone is in terms of access to a person. In classic articles, Gavison and Reiman characterize privacy as the degree of access that others have to you through information, attendance, and proximity (Gavison 1984; Reiman 1984).

Discussion about the importance of privacy for the individual intensified in the second half of the twentieth century, as patterns of living in societies became more and more individualistic. Privacy became linked to the valued notion of autonomy and the underlying idea of individual freedom. In both literature on privacy and judicial statements, this connection between privacy and autonomy has been a topic of intense discussion. Sometimes the two concepts were even blended together, even though they should remain distinct. A sharp distinction between privacy and autonomy is necessary to get to grips with the normative dimension of privacy.

The concept autonomy is derived from the ancient Greek words autos (self) and nomos (law). Especially within the Kantian framework, the concept is explicated in terms of a rational individual who, reflecting independently, takes his own decisions. Being autonomous was thus understood mainly as having control over one’s own life. In many domains of professional ethics (healthcare, consumer protection, and scientific research), autonomy is a key concept in defining how human beings should be treated. The right of individuals to control their own life should always be respected. The patient, the consumer, and the research participant each must be able to make his or her own choices (Strandburg 2014). Physicians are supposed to fully inform patients; advertisers who are caught lying are censured; and informed consent is a standard requirement of research ethics. In each of these cases, persons should not be forced, tempted, or seduced into performing actions they do not want to do.

When privacy and autonomy are connected, privacy is described as a way of controlling one’s own personal environment. An invasion of privacy disturbs control over (or access to) one’s personal sphere. This notion of privacy is closely related to secrecy. A person who deliberately gains access to information that the other person wants to keep secret is violating the other person’s autonomy through information control. We see the emphasis on privacy as control over information in, for instance, Marmor’s description of privacy as ‘grounded in people’s interest in having a reasonable measure of control over the ways in which they can present themselves to others’ (Marmor 2015). Autonomy, however, does not entail an exhaustive description of privacy. It is possible that someone could have the ability to control, yet he or she lacks privacy. For instance, a woman who frequently absentmindedly forgets to close the curtains before she undresses enables her neighbour to watch her. If the neighbour does so, we can speak about a loss of the woman’s privacy. Nevertheless, the woman still has the ability to control. At any moment, she could choose to close the curtains. Thus, privacy requires more than just autonomy.

The distinction between privacy and autonomy becomes clearer in Judith Jarvis Thompson’s classic thought experiment (Taylor 2002). Imagine that my neighbour invented some elaborate X-ray device that enabled him to look through the walls. I would thereby lose control over who can look at me, but my privacy would not be violated until my neighbour actually started to look through the walls. It is the actual looking that violates privacy, not the acquisition of the power to look. If my neighbour starts observing through the walls but I’m not aware of it and believe that I am carrying out my duties in the privacy of my own home, my autonomy would not be directly undermined. Not only in thought experiments, but also in literature and everyday life, we witness the difference between autonomy and privacy. Taylor refers to Scrooge in Dickens’ A Christmas Carol who is present as a ghost at family parties. His covert observation of the intimate Christmas dinner party implies a breach of privacy, although he does not influence the behaviour of the other people. In everyday life, we do not experience an inadvertent breach of privacy (for instance, a passer by randomly picking up some information) as loss of autonomy.

These examples make it clear that there is a difference between autonomy, which is about control, and privacy, which is about knowledge and access to information. The most natural way to connect the two concepts is to consider privacy as a tool that fosters and encourages autonomy. Privacy thus understood contributes to demarcation of a personal sphere, which makes it easier for a person to make decisions independently of other people. But a loss of privacy does not automatically imply loss of autonomy. A violation of privacy will result in autonomy being undermined only when at least one additional condition is met: the observing (privacy-violating) person is in one way or another influencing the other person (Taylor 2002). Such a violation of privacy can take various forms. For instance, the person involved might feel pressure to alter her behaviour just because she knows she is being observed. Or a person who is not aware of being observed is being manipulated. This, in fact, occurs more than ever before in the digital age.

Loss of autonomy in the digital age

In the more than 100 years following Warren and Brandeis’ publication of their definition, privacy was mainly considered to be a spatial notion. For example, the right to be left alone was the right to have one’s own space in a territorial sense, e.g., at home behind closed curtains, where other people were not allowed. An important topic in discussions of privacy was the embarrassment experienced when someone else entered the private spatial domain. Consider, for example, public figures whose privacy is invaded by obtrusive photographers or people who feel invaded when someone unexpectedly enters their home (Roessler 2009; Gavison 1984).

The digital age is characterized by the omnipresence of hidden cameras and other surveillance devices. This kind of observation and the corresponding embarrassment that it can cause have changed our ideas about privacy. The main concern is not the intrusive eye of another person, but the constant observation, which can lead to the panopticon experience of the interiorized gaze of the other. It is self-evident that the additional conditions are now being met, viz., the person’s autonomy is threatened. In situations in which the observed person feels impeded to follow his impulses (Van Otterloo 2014), the loss of privacy leads to diminished autonomy.

The loss of autonomy resulting from persistent surveillance becomes even more striking when we take into consideration the unprecedented collection and storage of non-visual information. Collecting data on individuals, such as through the activity of profiling, offers commercial parties and other institutions endless possibilities for approaching people in ways that meet the institution’s own interests. Driven by invisible algorithms, these institutions temp, nudge, seduce, and convince individuals to participate for reasons that are advantageous to the institution. The widespread application of algorithms in decision-making processes intensifies the problem of loss of autonomy in at least two respects. First, when algorithms are used to track people’s behaviour, there is no ‘observer’ in the strict sense of the word; no human (or other ‘cognitive entity’) actually ever checks the individual’s search profile. Nevertheless, the invisibility of the watchful entity does not diminish the precision with which the behaviour is being tracked; in fact, it is quite the opposite. Second, in the digital age mere awareness of the possibility that surveillance techniques exist has an impact on human behaviour, independently of whether there is actually an observing entity. More than ever before, Foucault’s (1975) addition to Bentham’s panopticon model is relevant. The gaze of the other person is internalized.

This brings us to the conclusion that, despite the fact that a loss of privacy does not necessarily involve a loss of autonomy, in the digital age when privacy is under threat, the independence of individual decisions is typically also compromised.

C4LEB

4 weeks ago

Google to purge billions of files containing Chrome users' personal data

Google has agreed to purge billions of records containing personal information collected from more than 136 million users of its Chrome web browser in the US.
The massive house cleaning comes as part of a settlement from a lawsuit accusing the search giant of illegal surveillance.
Among other allegations, the lawsuit accused Google of tracking Chrome users' internet activity even when they had switched the browser to the "Incognito" setting that is supposed to shield them from being shadowed by the California-based company.
The settlement requires Google to expunge billions of personal records stored in its data centres and make more prominent privacy disclosures about Chrome's Incognito option when it is activated.
It also imposes other controls designed to limit Google's collection of personal information.
The company said it was only required to "delete old personal technical data that was never associated with an individual and was never used for any form of personalisation".
In court papers, the attorneys representing Chrome users painted a much different picture, depicting the settlement as a major victory for personal privacy in an age of ever-increasing digital surveillance.
• https://www.abc.net.au/news/2024-04-02/google-to-purge-billions-of-files-containing-personal-data-in-se/103657584

C4LEB

4 weeks ago

Google to warn users they can be tracked while in Chrome's 'Incognito' mode

Google's Chrome web browser will soon include a warning that "private" browsing does not prevent users from being tracked, a move that comes months after Google settled a related privacy lawsuit.

A pre-release version of the browser has been found to include an updated privacy warning that addresses a key piece of evidence in the settled lawsuit — that users "overestimate" the privacy protection features in Chrome.

When the change is rolled out, users will be warned that private browsing "won't change how data is collected by websites you visit and the services they use, including Google".

The privacy lawsuit claimed tracking users in "Incognito" mode "allows Google to offer better, more targeted, advertisements to users".

It argued that this was "the core of Google's business", with the tech company generating billions in revenue from its advertising business each year, and that Google sold data that was collected from private browsing sessions.

• https://www.abc.net.au/news/2024-01-18/chrome-incognito-mode-privacy-warning-change/103361328

C4LEB

4 weeks ago

See your identity pieced together from stolen data

(Australian users)

Have you ever wondered how much of your personal information is available online? Here’s your chance to find out.• https://www.abc.net.au/news/2023-05-18/data-breaches-your-identity-interactive/102175688

C4LEB

4 weeks ago

This (the above) is why some of you have received these (and other) messages:

□ "Please do not display personal information such as social media addresses. This is for on-site safety and personal protection."

□ "Hello there, would you remove your email address from your profile, please? Perhaps replace it with "DM me for contact" or similar. Personal information on walls is discouraged for privacy and e-safety reasons."

The whole point of this intervention is to not have third-party snoops using file-sharing as an excuse to shut GBT down. Putting such requests in your About Me section might work, but honestly, "DM for contact" is suitably vague.

C4LEB's Blogs

Send Message to C4LEB

A win for privacy

... or, Why you shouldn't post personal details on GBT (or anywhere)

Comments

Reasons for not putting personal details on GBT

There are people out there who want your data and there are courses out there on how to do it

How to poison the data that Big Tech uses to surveil you

Algorithms are meaningless without good data. The public can exploit that to demand change.

The importance of privacy: autonomy

Loss of autonomy in the digital age

Google to warn users they can be tracked while in Chrome's 'Incognito' mode

See your identity pieced together from stolen data

(Australian users)

The whole point of this intervention is to not have third-party snoops using file-sharing as an excuse to shut GBT down. Putting such requests in your About Me section might work, but honestly, "DM for contact" is suitably vague.