6.4.1.3 - Importance of Data Work

 
💡
Data Work is more important than Model Work

Putting the data before the algorithm in big data addressing personalized healthcare

Algorithms have the potential to worsen disparities currently intrinsic to the contemporary healthcare system, including racial biases.
 
Blame for these deficiencies has often been placed on the algorithm—but the underlying training data bears greater responsibility for these errors, as biased outputs are inexorably produced by biased inputs.
 
The utility, equity, and generalizability of predictive models depend on population-representative training data with robust feature sets.
 
This may be conceptualized as clinical decision questioning, intended to liberate the human predictive process from preconceived lenses in data solicitation and/or interpretation.
 
Awareness of data deficiencies, structures for data inclusiveness, strategies for data sanitation, and mechanisms for data correction can help realize the potential of big data for a personalized medicine era.
 
Applied deliberately, these considerations could help mitigate risks of perpetuation of health inequity amidst widespread adoption of novel applications of big data
 
💡
Data Needs to be Available, Usable, Readable, Relevant and Presentable
 
notion image
 
notion image

"Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI

AI models are increasingly applied in high-stakes domains like health and conservation.
Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations.
Paradoxically, data is the most undervalued and deglamorized aspect of AI.
Data cascades (technical debt over time) are pervasive (92% prevalence), invisible, delayed, but often avoidable. We discuss HCI opportunities in designing and incentivizing data excellence as a first-class citizen of AI, resulting in safer and more robust systems for all
Compounding events causing negative, downstream effects from data issues---triggered by conventional AI/ML practices that undervalue data quality.
notion image
 

Crowdsourcing Medical Labeling - Not Done by Professionals

The quality of Data is extremely important to establish "ground truth" in high stakes like healthcare.
“... Ms. Pradhan was looking for polyps, small growths in the large intestine that could lead to cancer. When she found one — they look a bit like a slimy, angry pimple — she marked it with her computer mouse and keyboard, drawing a digital circle around the tiny bulge. She was not trained as a doctor, but she was helping to teach an artificial intelligence system that could eventually do the work of a doctor”
Why are not Physicians teaching AI?
 

Quality Control Challenges in Crowdsourcing Medical Labeling and Diagnosis

Crowdsourcing has enabled the collection, aggregation and refinement of human knowledge and judgment, i.e. ground truth, for problem domains with data of increasing complexity and scale.
This scale of ground truth data generation, especially towards the development of machine learning based medical applications that require large volumes of consistent diagnoses, poses significant and unique challenges to quality control.
Poor quality control in crowdsourced labeling of medical data can result in undesired effects on patients' health.
Medicine-specific quality control problems, including the diversity of grader expertise and diagnosis guidelines' ambiguity in novel datasets of three eye diseases.
We present analytical findings on physicians' work patterns, evaluate existing quality control methods that rely on task completion time to circumvent the scarcity and cost problem of generating ground truth medical data, and share our experiences with a real-world system that collects medical labels at scale.
 
 
 

Crowdsourcing human-based computation for medical image analysis: A systematic literature review

A systematic literature review of studies on crowdsourcing human-based computation for medical image analysis based concluded "Crowdsourcing human-based computation systems can successfully solve medical image analysis problems"
I DISAGREE. We need requirements at a federal level to ensure transparency of Data collection and labeling process.
 
 

Preparing Medical Imaging Data for Machine Learning

Preparation of data is a costly and time-intensive process, the results of which are algorithms with limited utility and poor generalization.
Fundamental steps for preparing medical imaging data in AI algorithm development
 
notion image
Diagram shows process of medical image data handling.
 
 
 
notion image
Diagram shows value hierarchy of imaging annotation.
Most useful but least abundant is ground truth (pathologic, genomic, or clinical outcome data).
A Living Pocketbook of Neurology
A Living Pocketbook of Neurology
notion image

🧑‍🎓 Why Contribute ?

🎟️ Contribution Application
Applied, Concise, Practical, Up-to-date, Mobile-friendly, peer-reviewed & Open-Access Pocketbook of Neurology and related clinical specialties
Unleash the Digital Healer in You! >> 🌐 https://junaidkalia.com <<
Unleash the Digital Healer in You! >> 🌐 https://junaidkalia.com <<
Learn Digital Health, AI-in-Healthcare, ValueBased Care & More!
🎯 Follow Junaid Kalia MD [Editor-in-Chief]
"If anyone saved a life, it would be as if he saved the life of all mankind"
"If anyone saved a life, it would be as if he saved the life of all mankind"
An AINeuroCare Academy Project - Coaching Clinicians in Digital Healthcare
💡 Learn! 🎥 YouTube 💌 Newsletter 🎓Academy 📝 Blog
👨‍⚕️ Connect! 👥 Linkedin 👋 Facebook 🕊️ Twitter 📸 Instagram

Disclaimer: For Education purposes only. You must NOT rely on the information on this website as an alternative to medical advice from your healthcare provider

CC BY-NC-ND - Attribution-NonCommercial-NoDerivs - AINeuroCare PLLC ©️