bias-in-machine-learning.html

---
layout: presentation
title: Bias in Machine Learning  --Week 6--
description: Comparison of Assessment Approaches
class: middle, center, inverse
---
background-image: url(img/people.png)

.left-column50[
# Week 6: Bias in Machine Learning

{{site.classnum}}, {{site.quarter}}
]
---
name: normal
layout: true
class:

---
# Important Reminder

## This is an important reminder
## Make sure zoom is running and recording!!!
## Make sure captioning is turned on

---
[//]: # (Outline Slide)
# Learning Goals for Today

- What is Machine Learning (ML)? 

- What are the components of ML?

- How do we collect data? Who do we collect the data from?

- Is the data "good"?

- How do we minimize disability bias?

---
# Machine Learning

![:img Screenshots of recent news articles on machine learning,100%, width](img/data-equity/ml-news.png)

.center[**But really, *what is it*?**]

---
# Machine Learning

Machine Learning changes the way we think about a problem.

But *how*?

- What is the *traditional* approach to solve a problem?

- How does Machine Learning solve a problem?


---
# Helping Computers Learn Patterns

.left-column50[
![:fa bed, fa-7x]
]
.right-column50[
## How might you recognize sleep?

- Can you come up with a yes/no question or a set of categories or simple description of sleep?
   - Sleep quality?
   - Sleep start/end?
- What data would you learn from?
- How might you need to take disabilites into account?
]
???

(sleep quality? length?...)

How to interpret sensors?

---
# How do we program this?

Old Approach: Create software by hand
- Use libraries (like JQuery) and frameworks
- Create content, do layout, code up functionality
- Deterministic (code does what you tell it to)

New Approach: Collect data and train algorithms
- Will still do the above, but will also have some functionality based
on ML
- *Collect lots of examples and train a ML algorithm*
- *Statistical way of thinking*

---
# Shift in Approaches

.left-column50[
## Old style of app design
<div class="mermaid">
graph TD

I(Input) --Explicit Interaction--> A(Application)
A --> Act(Action)

classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px;

class U,C,A,I,S,E,Act,Act2 normal
</div>
]


--
count: false
.right-column50[
## New style of app design
<div class="mermaid">
graph TD

U(User) --Implicit Sensing--> C(Application)
S(System) --Implicit Sensing--> C
E(Environment) --Implicit Sensing--> C
C --> Act2(Action)

classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px;

class U,C,A,I,S,E,Act,Act2 normal
</div>
]
---
# This is *Machine Intelligence*

Often used to process sensor data

Goal is to develop systems that can improve
performance with more experience
- Can use "example data" as "experience"
- Uses these examples to discern patterns
- And to make predictions

Not really intelligent, just my word for Machine Learning, AI, Neural Programming, etc etc

---
# Components of Machine Intelligence

- **Collect data (and lots and lots of it!)**

- Discern patterns

- Make predictions

---
# Data Collection

- How do we collect data?

- Where do we collect data from?

- Who do we collect data from?

---
# Problems with Data
- System timeouts that are trained on movement speeds of <q>typical</q> people
- Biometrics that cannot function on a person who isn't still for long enough
- Inferencing about people that doesn't account for height; stamina; range of motion; or AT use (e.g. wheelchairs)

When groups are historically
marginalized and underrepresented, this is
.quote[imprinted in the data that shapes AI
systems... Those who have borne discrimination in the past are most at risk of harm from
biased and exclusionary AI in the present. (Whittaker, 2019)] 

--
This can cascade -- e.g. measurement bias can exacerbate bias downstream. For example, facial mobility, emotion expression, and facial structure impact detection and identification of people; body motion and shape impact activity detection; etc. 

---
# How might we address bias/fairness in data sets

We need to know it is there (Aggregate metrics can hide  performance problems in under-represented groups)

We need to be careful not to eliminate, or reduce the influence, of outliers if that erases disabled people from the data because of the heterogeneity of disability data.

---
# Approaches to measuring fairness 

We may need to rethink <q>fairness</q> in terms of individual rather than group outcomes, and define metrics that capture a range of concerns
- Movement speed might favor a wheelchair user
- Exercise variety might favor people who do not have chronic illness
- Exertion time might covers a wide variety of different types of people.

Defining such unbiased metrics requires careful thought and domain knowledge, and scientific research will be essential to defining appropriate procedures for this.

---
# Small Group Discussion [Post on Ed]({{site.discussion}}2514887)

Who might be excluded in the data set you found?

How was fairness measured in the data set you found, if it was discussed?

How would you go about testing for fairness in that data?


---
# Best Practices For Data Fairness

How do we motivate and ethically compensate disabled people to give their data?

What should we communicate at data collection time? 

Is the data collection infrastructure accessible? Does it protect sensitive information about participants adequately given the heterogeneous nature of disability?

Does the meta data collected oversimplify disability? Who is labeling the data and do the have biases affecting labeling?
  - Whittaker (2019) discusses the example of clickworkers who label people
as disabled <q>based on a hunch</q>. 

---
# Components of Machine Intelligence

- Collect data (and lots and lots of it!)

- **Discern patterns**

- Make predictions


---
# Two main approaches

![:fa eye] *Supervised learning* (we have lots of examples of what should be
 predicted)

![:fa eye-slash] *Unsupervised learning* (e.g. clustering into groups and inferring what
they are about)

![:fa low-vision] Can combine these (semi-supervised)

![:fa history]  Can learn over time or train up front

---
# Our Focus: Supervised Learning

![:fa eye] *Supervised learning* (we have lots of examples of what should be
 predicted)

![:fa eye-slash] *Unsupervised learning* (e.g. clustering into groups and inferring what
they are about)

![:fa low-vision] Can combine these (semi-supervised)

![:fa history]  Can learn over time or train up front

---
# Supervised Learning

.left-column50[
## Training Process

<div class="mermaid">
graph TD

L(Label) --> MI(Training Algorithm)
D(Input Data) -- Extract Features--> MI
MI --> C(Symbolic Predictor)

classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px;

class D,U,C,A,I,S,E,Act,Act2 normal
</div>
]


.right-column50[
## Extracting Features

Symbolic requires feature engineering (humans deciding how to *summarize* data using features. Tends to be more *interpretable* (you can figure out why they make predictions)

]

---
# Supervised Learning

.left-column50[
## Training Process

<div class="mermaid">
graph TD

L(Label) --> MI(Training Algorithm)
D(Input Data) --> MI
MI --> C(Neural Predictor)

classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px;

class D,U,C,A,I,S,E,Act,Act2 normal
</div>
]


.right-column50[
## Designing Networks

Neural approaches (e.g. ChatGPT) use massive amounts of data to train a network according to base principales. Designing the right network is critical. Cannot be sure *why* they make the predictions they do. 
]

---
# Supervised Learning

.left-column50[
## Training Process

<div class="mermaid">
graph TD

L(Label) --> MI(Training Algorithm)
D(Input Data) --> MI
MI --> C(Symbolic/Neural Predictor)

classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px;

class D,U,C,A,I,S,E,Act,Act2 normal
</div>
]


.right-column50[
## Prediction Process

<div class="mermaid">
graph TD

D(Input Data) --> C(Symbolic/Neural Predictor)
C --> P(Prediction)

classDef normal fill:#e6f3ff,stroke:#333,stroke-width:2px;

class P,D,U,C,A,I,S,E,Act,Act2 normal
</div>
]


---
# How do we Evaluate Predictors?

Compare to Prior probabilities
- Probability before any observations (ie just guessing)
- Ex. ML classifier to guess if an animal is a cat or a ferret based on the ear location
 - Assume all pointy eared fuzzy creatures are cats (some percentage will be right)

Compare to simplistic algorithms
- Ex. Classifying cats vs ferrets based on size
- Your model needs to do better than these too

Surprising how often this doesn't happen in published work/before deployment

???
We did this to study gender's impact on academic authorship; doctors reviews

---
# Adding Nuance
.left-column50[
## <q>Confusion Matrix</q>
![:img Confusion matrix of a machine learning model,100%, width](img/data-equity/ml-faulty.png)
]
.right-column50[
Don't just measure accuracy (percent correct)

Lots of other metrics based on false positives and negatives
- Precision = TP / (TP+FP) Intuition: Of the positive items, how many right?
- Recall = TP / (TP+FN) Intuition: Of all things that should have been positive, how many actually labeled correctly?
- ... Many More
]

---
# Using Proper Methods

.left-column50[
**Symbolic Methods Can Easily Overfit**

When your ML model is too specific for data you have, it might not generalize well

Best test is a data set you haven't seen before

![:img overfitting is illustrated as a line snacking between data points to minimize error instead of smoothly rising among them , 80%,width](img/data-equity/overfitting.png)

]

.right-column50[
**Neural Methods Can Have Hidden Biases** 

![:img A headline from the Verge stating that Twitter taught Microsoft's AI Chatbot to be a racist asshole in less than a day, 80%,width](img/data-equity/racist-chatbot.png)

]

---
# Disability Biases to Watch Out For

Norms are baked deeply into algorithms which are designed to learn about the most common cases. As human judgment is increasingly replaced by biometrics, *norms* become more strictly enforced. 
- Do outliers face higher error rates? 
- Do they disproportionately represent and misrepresent people with disability?
- How does this impact allocation of resources?

---
# How does norming harm people with disabilities?

Machine intelligence  already being used to track  allocation of assistive technologies, from CPAP machines for people with sleep apnea (Araujo 2018) to prosthetic legs (as described by Jullian Wiese in
Granta and uncovered in Whittaker et al 2019), deciding who is <q>compliant enough</q> to deserve them. 

Technology may also fail to recognize that a disabled person is even present (Kane, 2020),  thus <q>demarcating what it means to be a legible human and
whose bodies, actions, and lives fall outside... [and] remapping and calcifying the boundaries
of inclusion and marginalization</q> (Whittaker, 2019). 

---
# How does norming harm people with disabilities?

Many biometric systems gatekeep access based on either individual identity, identity as a human, or class of human, such as <q>old enough to buy cigarettes.</q>
Examples:
- a participant having to falsify data because <q>some apps [don’t allow] my height/weight combo for my age.</q> (Kane (2020))
- a person who must ask a stranger to ‘forge’ a signature at the grocery store <q>.. because I can’t reach [the tablet]</q> (Kane (2020))
- at work, activity tracking may define <q>success</q> in terms that exclude disabled workers. (may also increase the likelihood of work-related disability, by forcing workers to work at maximal efficiency)
---
# Components of Machine Intelligence

- Collect data (and lots and lots of it!)

- Discern patterns

- **Make predictions**

---
# Concerns at Prediction Time

Denial of insurance and medical care, or threaten employment (Whittaker,
2019, p. 21).
- HireVue, an AI based video
interviewing company has a patent on file to detect disability (Larsen, 2018). 
- This is illegal under the ADA, which
   - forbids asking about disability
status in a hiring process (42 U.S.C. § 12112(a)) 
   - forbids <q>using qualification
standards, employment tests or other selection criteria that screen out or tend to screen out
an individual with a disability</q> (42 U.S.C. § 12112(b)(6)). 

---
# Concerns at Prediction Time

Denial of insurance and medical care, or threaten employment

Disability identification 
- Examples: detect Parkinsons from gait (Das, 2012), and mouse movement (Youngmann,
2019); detecting autism from home videos (Leblanc, 2020). 
- What are the ethics of doing this without consent?
- Many of these algorithms encode medical model biases

Relatedly, failure to identify disability 
- Legally under the ADA, if you are treated as disabled, you are disabled. Yet biometrics cannot detect how people are treated. 

---
# Concerns at Prediction Time

Denial of insurance and medical care, or threaten employment 

Disability Identification / Failure to Identify

Apps that Harm
- Example: Training behaviors in <q>support</q> of autistic individuals without regard to debates about agency and independence of the target audience [Demo, 2017]; 
- As with regular accessibility apps, AI based aps can harm, be disability dongles, etc
- As with regular apps, AI based apps may not be accessible

---

# Concerns at Prediction Time

![:img Three news headlines-- On Orbitz Mac Users Steered to Pricier Hotels; Google's algorithm shows prestigious job ads to men but not to women; Racial bias alleged in Google's add results, 60%,width](img/data-equity/bias.png)

---
# Concerns at Prediction Time
Denial of insurance and medical care, or threaten employment 

Disability Identification / Failure to Identify

Apps that Harm

AI with Baked in Biases
- Consequences of biased data and lack of control over training results more nuanced than just accuracy (as with headlines we just read)
- Privacy can also be a concern. 
  - For rare conditions, an algorithm may learn
to recognize the disability, rather than the individual, reducing
security when used for access control, allowing multiple people with
similar impairments to access the same data.

---
# Concerns at Prediction Time

Denial of insurance and medical care, or threaten employment 

Disability Identification / Failure to Identify

Apps that Harm

AI with Baked in Biases

Transparency and Accountability
- Power differences between builders and users 
- Representation of disabled people among builders
- Algorithms that are not *interpretable* or *correctable*
- Users of algorithms whose use them to enforce larger societal harms

---
# Small Group Discussion [Post on Ed]({{site.discussion}}2515387)

Revisit the data set you chose

Do you know what sort of predictions it was used for if any?

What possible harms could be done with those predictions?

Reminder of our list
- Denial of insurance and medical care, or threaten employment 
- Disability Identification / Failure to Identify
- Apps that Harm
- AI with Baked in Biases
- Transparency and Accountability


---
# End of Deck