Mapping out the DRAPAC26 Submission proposal entries by numbers and topics
The Digital Rights in Asia-Pacific Assembly (DRAPAC) 2026 is an important gathering of digital rights advocates, policymakers, activists, and technologists dedicated to discussing and addressing pressing digital rights issues in the Asia-Pacific region. I'm very much excited for this year's DRAPAC, and I am interested in what and how organizations and individuals are submitting for their sessions/ideas. I believe it is important to look over all the ideas and get the gist of them, to be able to understand how the narrative has evolved and what the interests/demands of the submitters are.
To achieve that, I gathered (scraped) all entries using my personal DRAPAC account and counted all the metadata manually myself (with the help of LLMs/AI). The organizing team did not give me the data, and I have informed them that I ran a scraper.
This year, DRAPAC26 focuses on 2 themes (Track 1: Co-creating shared resources and Track 2: Collaborative action across movements) and 8 session formats. The submission period is closed, and it gained 313 entries. So, let's unfold this further by the numbers.
Thematic tracks:
| Track | Sessions | Share |
|---|---|---|
| Co-creating shared resources | 227 | 57.8% |
| Collaborative action across movements | 163 | 41.5% |
| Untagged | 3 | 0.8% |
Session Formats
| Format | Total | Share |
|---|---|---|
| Co-learning workshop | 122 | 31.0% |
| Panel discussion | 104 | 26.5% |
| Roundtable discussion | 62 | 15.8% |
| Ideation workshop | 56 | 14.2% |
| Exhibit / display / performance | 20 | 5.1% |
| Social gathering | 13 | 3.3% |
| Booth, stand, or clinic | 9 | 2.3% |
| Parallel event | 7 | 1.8% |
Track × Format Matrix
| Format | Co-creating | Collaborative |
|---|---|---|
| Co-learning workshop | 91 | 30 |
| Panel discussion | 43 | 59 |
| Roundtable discussion | 27 | 35 |
| Ideation workshop | 42 | 14 |
| Exhibit / display / performance | 13 | 7 |
| Social gathering | 3 | 10 |
| Booth, stand, or clinic | 5 | 4 |
| Parallel event | 3 | 4 |
As the data above shows, we can see that the "Co-creating shared resources" track is leading by 16.3% differences compared to the "Collaborative action across movements" track. This highlights the high interest in this track, which focuses on building shared infrastructure—from platforms, systems, and databases, to skills, networks, and strategies—that can be scaled across the region.
While there is not a significant difference, Track 2 (Collaborative action across movements) focuses on bridging the gap between civil society, government, and the private sector to co-develop rights-based digital policies. It also stresses the importance of discussions pertaining to these issues.
Looking at the track-by-format matrix, we can see that each track has specific preferences toward the session formats it conducts to suit what's best for the discussions or activities. Track 1 garnered a preference for co-learning workshops and ideation workshops, while on the other hand, Track 2 focuses more on panel discussions, roundtable discussions, and social gatherings.
Furthermore, to get a better understanding of what people are proposing, I ran basic topic modeling on the submission content to get a look at the ideas people are interested in focusing on or discussing. I utilized a specific topic modeling technique called LDA (Latent Dirichlet Allocation) to uncover the central topics and their distributions across the set of content's sessions. The following table uncover which words are being frequently used.
| Topic | Sessions | Top Words (topn=12) |
|---|---|---|
| T0: AI Systems & Public Accountability | 65 (16.5%) | data, social, public, platforms, media, surveillance, systems, algorithmic, platform, youth, people, power |
| T1: Civil Society Digital Security Capacity | 9 (2.3%) | human, security, strategies, civil, society, technical, law, training, myanmar, challenges, action, cybersecurity |
| T2: Regional Governance & Policy | 110 (28.0%) | regional, civil, governance, policy, society, shared, platform, data, systems, online, public, asia-pacific |
| T3: Digital Rights Practice & Experience | 65 (16.5%) | space, data, human, shared, climate, people, online, open, collective, building, work, challenges |
| T4: Community Organizing & Grassroots Capacity | 53 (13.5%) | communities, shared, systems, community, collective, movement, regional, movements, asia-pacific, end, to:, grassroots |
| T5: Legal Frameworks, Evidence & Platform Accountability | 32 (8.1%) | legal, human, evidence, platform, security, platforms, challenges, international, regional, resource, criminal, media |
| T6: Digital Security Tools & Practical Response | 20 (5.1%) | human, online, legal, security, collective, right, practical, tools, strategies, session, decision, response |
| T7: Youth, Media & Narrative Resistance | 39 (9.9%) | internet, civic, movements, people, young, media, communication, social, online, collective, narratives, storytelling |
Moreover, let's see which countries or region being mentioned the most across proposal. The Country was calculated using a rule-based keyword match over session Title and Content fields. For each country, it counts session once per country if any variant was present.
| Country | Co-creating | Collaborative | Total |
|---|---|---|---|
| Pacific (region) | 112 | 83 | 196 |
| Indonesia | 67 | 45 | 114 |
| Philippines | 51 | 44 | 96 |
| India | 56 | 50 | 106 |
| Nepal | 41 | 32 | 73 |
| Myanmar | 31 | 19 | 50 |
| Bangladesh | 30 | 18 | 48 |
| Pakistan | 11 | 14 | 26 |
| Malaysia | 19 | 7 | 26 |
| Sri Lanka | 13 | 7 | 20 |
| Thailand | 9 | 4 | 14 |
| Taiwan | 10 | 3 | 13 |
| Singapore | 6 | 6 | 12 |
| Vietnam | 5 | 4 | 9 |
| China | 2 | 5 | 7 |
| Cambodia | 4 | 3 | 7 |
| Korea | 3 | 3 | 6 |
| Japan | 2 | 3 | 5 |
| Australia | 1 | 2 | 4 |
| Papua | 2 | 0 | 2 |
| Timor | 1 | 0 | 1 |
| Laos | 1 | 0 | 1 |
Lastly, I ran a word-matching script to see if AI-related keywords were mentioned in the content of the proposals. It turns out that almost 30% of the entries include AI-related terms, as follows:
| Term | Sessions | Share |
|---|---|---|
| "ai" (word-boundary) | 94 | 23.9% |
| "artificial intelligence" (substring) | 14 | 3.6% |
| "algorithmic" (word-boundary) | 36 | 9.2% |
| "algorithm" (word-boundary) | 7 | 1.8% |
| "genai" (substring) | 1 | 0.3% |
| Union (ai | algorithmic | algorithm | genai) | 117 | 29.8% |
It is fascinating to see diverse ideas being submitted. We certainly won't see most of them being presented at DRAPAC, but I'm sure the ideas are worth spreading and must be taken into account, especially for individuals and organizations within the region and beyond.
To close, most of the scripts/code that helped me to generate the data above were generated by LLMs and agentic AI platforms. I have manually observed and adjusted some for cross-checking. I attached the full analysis below, generated 100% by AI. The wording above was mostly made by me. I am planning to update this post to include the information of all the scraped sessions and the generated code, in particular for reproducibility or experimenting further.
DRAPAC 26 — Session Analysis
Source: drap.ac/26/activities/
Scraped: Wave 1 = 2026-03-29 (311 files); Wave 2 = 2026-03-30 (+82 files, total 393)
Total sessions: 393 markdown files
Vault source: D-ARCHIVES/DRAPAC 26 Sessions Submission/
Executive Summary
DRAPAC 26 (Digital Rights Asia-Pacific 2026) is the Asia Pacific Regional Internet Governance Forum. Three methods were applied to the 393 session Content fields:
- Unsupervised LDA topic modeling (k=8, k=12) — discovers latent themes statistically
- UMAP + HDBSCAN clustering — semantic clustering of sessions by document-topic similarity
- Content coverage clusters (keyword-based) — comparative reference
Key findings:
- LDA identifies Regional Governance & Policy (T2, 110 sessions, 28%) as the single largest latent topic — more than governance frameworks, civil society engagement, and regulatory policy combined. This is the programme's backbone discourse.
- Semantic clustering discovers 7 distinct session communities, including: "Data & Platform Accountability", "Legal & Human Rights Evidence", "Community Infrastructure Design", "Civil Society Governance AI", "Open Space & People", and "Collective Care Movements" — each with distinct track skew.
- Track differentiation is real but nuanced: Collaborative skews toward legal/evidence (T5) and surveillance discourse (T0); Co-creating skews toward collective care movements (C5) and community infrastructure (C4). The clearest differentiator is Governance & Policy framing, which Co-creating leads on.
- AI (explicit "ai" + "algorithmic" + "genai") appears in 29.8% of sessions — making it the second-most-discussed substantive concern after governance.
Overview
DRAPAC 26 (Digital Rights Asia-Pacific 2026) is the Asia Pacific Regional Internet Governance Forum. The vault contains 393 scraped session submissions across two waves: 311 files scraped 2026-03-29 and 82 additional files from 2026-03-30.
Theme analysis design: Three complementary unsupervised methods were applied to the Content field (session description body only). The Content field was preprocessed: lowercased, stopwords removed, markdown stripped, and tokenised (min word length 3, no punctuation-only tokens). Gensim LDA was used for topic modeling; UMAP + HDBSCAN for semantic clustering; content coverage analysis (keyword clusters) as comparative reference.
All theme statistics are Content-only — Organiser and Title fields are excluded to avoid inflation from personal specialisations and session labels.
Thematic Tracks
DRAPAC 26 has two thematic tracks:
| Track | Sessions | Share |
|---|---|---|
| Co-creating shared resources | 227 | 57.8% |
| Collaborative action across movements | 163 | 41.5% |
| Untagged | 3 | 0.8% |
Session Formats
| Format | Total | Share |
|---|---|---|
| Co-learning workshop | 122 | 31.0% |
| Panel discussion | 104 | 26.5% |
| Roundtable discussion | 62 | 15.8% |
| Ideation workshop | 56 | 14.2% |
| Exhibit / display / performance | 20 | 5.1% |
| Social gathering | 13 | 3.3% |
| Booth, stand, or clinic | 9 | 2.3% |
| Parallel event | 7 | 1.8% |
Track × Format Matrix
| Format | Co-creating | Collaborative |
|---|---|---|
| Co-learning workshop | 91 | 30 |
| Panel discussion | 43 | 59 |
| Roundtable discussion | 27 | 35 |
| Ideation workshop | 42 | 14 |
| Exhibit / display / performance | 13 | 7 |
| Social gathering | 3 | 10 |
| Booth, stand, or clinic | 5 | 4 |
| Parallel event | 3 | 4 |
Countries & Regional Focus
| Country | Co-creating | Collaborative | Total |
|---|---|---|---|
| Pacific (region) | 112 | 83 | 196 |
| Indonesia | 67 | 45 | 114 |
| Philippines | 51 | 44 | 96 |
| India | 56 | 50 | 106 |
| Nepal | 41 | 32 | 73 |
| Myanmar | 31 | 19 | 50 |
| Bangladesh | 30 | 18 | 48 |
| Pakistan | 11 | 14 | 26 |
| Malaysia | 19 | 7 | 26 |
| Sri Lanka | 13 | 7 | 20 |
| Thailand | 9 | 4 | 14 |
| Taiwan | 10 | 3 | 13 |
| Singapore | 6 | 6 | 12 |
| Vietnam | 5 | 4 | 9 |
| China | 2 | 5 | 7 |
| Cambodia | 4 | 3 | 7 |
| Korea | 3 | 3 | 6 |
| Japan | 2 | 3 | 5 |
| Australia | 1 | 2 | 4 |
| Papua | 2 | 0 | 2 |
| Timor | 1 | 0 | 1 |
| Laos | 1 | 0 | 1 |
Method 1: LDA Topic Modeling
Design
Latent Dirichlet Allocation (LDA) discovers latent topics as probability distributions over words — sessions are not hard-assigned to topics but have a probability distribution across all topics. k is the number of topics the model is constrained to find; you choose it before running the model. Gensim LDA was used with:
- Dictionary: tokenised Content, stopwords removed (custom stoplist + Gensim STOPWORDS), min doc frequency 5, max doc frequency 65%
- Two runs:
k=8(primary, 40 passes) andk=12(for HDBSCAN input, 30 passes) - Alpha/eta:
auto— the model learns document-topic and topic-word sparsity from the data
LDA Topics Discovered (k=8)
Each topic is a ranked list of words. The number in parentheses is the number of sessions where this topic is the dominant topic (highest probability in the k=8 model).
| Topic | Sessions | Top Words (topn=12) |
|---|---|---|
| T0: AI Systems & Public Accountability | 65 (16.5%) | data, social, public, platforms, media, surveillance, systems, algorithmic, platform, youth, people, power |
| T1: Civil Society Digital Security Capacity | 9 (2.3%) | human, security, strategies, civil, society, technical, law, training, myanmar, challenges, action, cybersecurity |
| T2: Regional Governance & Policy | 110 (28.0%) | regional, civil, governance, policy, society, shared, platform, data, systems, online, public, asia-pacific |
| T3: Digital Rights Practice & Experience | 65 (16.5%) | space, data, human, shared, climate, people, online, open, collective, building, work, challenges |
| T4: Community Organizing & Grassroots Capacity | 53 (13.5%) | communities, shared, systems, community, collective, movement, regional, movements, asia-pacific, end, to:, grassroots |
| T5: Legal Frameworks, Evidence & Platform Accountability | 32 (8.1%) | legal, human, evidence, platform, security, platforms, challenges, international, regional, resource, criminal, media |
| T6: Digital Security Tools & Practical Response | 20 (5.1%) | human, online, legal, security, collective, right, practical, tools, strategies, session, decision, response |
| T7: Youth, Media & Narrative Resistance | 39 (9.9%) | internet, civic, movements, people, young, media, communication, social, online, collective, narratives, storytelling |
Interpretation: The largest latent topic is Regional Governance & Policy (T2, 28%) — sessions on civil society's role in internet governance, regulatory frameworks, and regional policy coordination. T0 (AI Systems & Public Accountability, 16.5%) is the second major focus: AI audits, platform surveillance, and public sector accountability. T3 (Digital Rights Practice & Experience, 16.5%) is a broad catch-all for sessions focused on practice and experience across accessibility, OGBV, climate data, and community care. T4 (Community Organizing & Grassroots Capacity, 13.5%) covers collective infrastructure: queer communities, trauma-informed care, peer exchange. T7 (Youth, Media & Narrative Resistance, 9.9%) is a distinct internet-culture-and-activism cluster. T1 and T6 are smaller but substantive: CSO security infrastructure and practical digital security tools respectively.
LDA Topic × Track
| Topic | Co-creating | Collaborative |
|---|---|---|
| T2: Regional Governance & Policy | 69 (30.4%) | 41 (25.2%) |
| T0: AI Systems & Public Accountability | 30 (13.2%) | 35 (21.5%) |
| T3: Digital Rights Practice & Experience | 33 (14.5%) | 31 (19.0%) |
| T4: Collective Community Movements | 31 (13.7%) | 22 (13.5%) |
| T7: Youth, Media & Narrative Resistance | 28 (12.3%) | 10 (6.1%) |
| T5: Legal Frameworks & Evidence | 18 (7.9%) | 13 (8.0%) |
| T6: Digital Security Tools & Practical Response | 14 (6.2%) | 6 (3.7%) |
| T1: Civil Society Digital Security Capacity | 4 (1.8%) | 5 (3.1%) |
Key track differentiators:
- Co-creating leads on: T2 Regional Governance & Policy (+5pp), T7 Youth, Media & Narrative Resistance (+6pp), T6 Digital Security Tools & Practical Response (+2.5pp)
- Collaborative leads on: T0 AI Systems & Public Accountability (+8pp), T3 Digital Rights Practice & Experience (+4.5pp)
Method 2: UMAP + HDBSCAN Semantic Clustering
Design
UMAP (Uniform Manifold Approximation and Projection) reduces the doc-topic matrix (12 LDA topics) to 2 dimensions using cosine distance, preserving semantic neighbourhood structure. HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) then clusters the 2D embedding, identifying dense regions as clusters and sparse regions as noise — without requiring a fixed number of clusters.
Parameters: UMAP (n_neighbors=15, min_dist=0.1, metric=cosine) → HDBSCAN (min_cluster_size=25, metric=euclidean, eom selection).
Clusters Discovered
7 clusters + noise (60 sessions, 15.3%). Each cluster is characterised by its dominant LDA topic, top TF-IDF terms, and track composition.
| Cluster | Sessions | Dominant LDA Topic | Co-creating | Collaborative | Label |
|---|---|---|---|---|---|
| C4 | 86 (21.9%) | T3: Digital Rights Practice & Experience | 44 (51%) | 40 (47%) | Open Space & People |
| C0 | 59 (15.0%) | T0: AI Systems & Public Accountability | 29 (49%) | 30 (51%) | Data, AI & Platform Accountability |
| C3 | 54 (13.7%) | T2: Regional Governance & Policy | 27 (50%) | 27 (50%) | Civil Society Governance AI |
| C5 | 46 (11.7%) | T4: Community Organizing & Grassroots | 37 (80%) | 9 (20%) | Collective Care & Movement |
| C6 | 31 (7.9%) | T9: Systems & Accountability | 14 (45%) | 17 (55%) | Country-Specific Implementation |
| C1 | 29 (7.4%) | T5: Legal & Evidence | 17 (59%) | 11 (38%) | Legal Evidence & Journalism |
| C2 | 28 (7.1%) | T8: Infrastructure Design | 17 (61%) | 11 (39%) | Community Infrastructure Design |
| Noise | 60 (15.3%) | — | 42 (70%) | 18 (30%) | — |
Cluster Profiles
C4 — Open Space & People (86 sessions, 21.9%)
The largest cluster. Sessions focus on people, internet, online space, media access, and open tools — session design, community digital access, and open digital commons. Nearly equal track split.
Top TF-IDF: session, people, internet, online, space, rights, media, access, social, open, tools
LDA: T3 "space | human | data | people | open"
C0 — Data, AI & Platform Accountability (59 sessions, 15.0%)
Sessions discuss data governance, AI, surveillance, platform accountability, and government oversight of digital platforms. Equal track split — this is a shared concern.
Top TF-IDF: data, ai, rights, surveillance, platforms, public, governments, governance, systems, accountability
LDA: T0 "data | public | surveillance | platforms"
C3 — Civil Society Governance AI (54 sessions, 13.7%)
Sessions focus on AI governance, civil society engagement with AI systems, and governance policy frameworks. Equal track split — this is a programmatic centrepiece.
Top TF-IDF: ai, governance, rights, data, systems, civil, civil society, content, society, digital rights
LDA: T2 "civil | data | governance | policy"
C5 — Collective Care & Movement (46 sessions, 11.7%)
The clearest Co-creating-skewed cluster (80% Co-creating, only 20% Collaborative). Sessions focus on collective care, movement infrastructure, activists' wellbeing, and digital security support for movements. This is the social-justice-organising heart of the programme.
Top TF-IDF: care, collective, support, communities, digital security, movement, activists, space, movements, online
LDA: T4 "shared | collective | movements | movement"
C6 — Country-Specific Implementation (31 sessions, 7.9%)
Sessions focused on country-level digital rights implementation, accountability mechanisms in India, Nepal, Philippines, and Indonesia — regulatory frameworks, remedies, and community digital systems at country level.
Top TF-IDF: systems, communities, indonesia nepal, digital systems, nepal, philippines indonesia, india philippines, remedy, centre
LDA: T9 "systems | communities | accountability | community"
C1 — Legal Evidence & Journalism (29 sessions, 7.4%)
Sessions on human rights evidence, criminal and international law, and journalist protection — a specialist legal cluster, slightly Co-creating skew.
Top TF-IDF: evidence, human, criminal, human rights, legal, rights, journalists, platforms, ai
LDA: T5 "legal | human | evidence | platforms | criminal"
C2 — Community Infrastructure Design (28 sessions, 7.1%)
Sessions focused on designing resources, frameworks, and infrastructure for communities — practical Co-creating sessions about building shared tools and frameworks.
Top TF-IDF: resources, design, session, community, framework, data, infrastructure, regional, platform, user, shared
LDA: T8 "shared | platform | regional | infrastructure | practical"
AI Presence (Content-Only)
AI detection uses case-insensitive matching against the Content field only (not Title or Organiser). Four forms are tracked separately, each using the appropriate match type:
ai_re = re.compile(r'\bai\b', re.IGNORECASE) # word-boundary
artif_re = re.compile(r'artificial intelligence', re.IGNORECASE) # phrase (no word boundary)
algo_re = re.compile(r'\balgorithmic\b', re.IGNORECASE) # word-boundary
algow_re = re.compile(r'\balgorithm\b', re.IGNORECASE) # word-boundary
genai_re = re.compile(r'genai', re.IGNORECASE) # substring: GenAI, genai-based
union = \bai\b OR \balgorithmic\b OR \balgorithm\b OR genai
Edge cases:
- Hyphenated forms ("AI-powered", "AI-driven"): 37 of 38 already caught by
\bai\b(standalone token present in same sentence); 1 additional session caught bygenai - "GenAI-based tools" (#37157):
genaisubstring catches it;\bai\bdoes not (no word boundary in merged token "GenAI") — this is the solegenaicount - "Geospatial Artificial Intelligence (GeoAI)" (#37432): caught by
artificial intelligencerow - "journalism aids remembering" (#38744): not AI — word "aids" has no word boundary between "ai" and "ds"; correctly excluded
| Term | Sessions | Share |
|---|---|---|
| "ai" (word-boundary) | 94 | 23.9% |
| "artificial intelligence" (substring) | 14 | 3.6% |
| "algorithmic" (word-boundary) | 36 | 9.2% |
| "algorithm" (word-boundary) | 7 | 1.8% |
| "genai" (substring) | 1 | 0.3% |
| Union (ai | algorithmic | algorithm | genai) | 117 | 29.8% |
LDA context: T0 (AI Systems & Public Accountability) is the primary AI cluster — sessions discussing data, platforms, surveillance, and algorithmic systems. C0 (Data, AI & Platform Accountability) and C3 (Civil Society Governance AI) are where AI discourse concentrates — governance frameworks, surveillance systems, and platform accountability.
Cross-Method Synthesis
LDA and HDBSCAN both identify consistent thematic structure. The old hardcoded keyword-cluster analysis is superseded and not referenced here.
| Finding | LDA (k=8) | HDBSCAN |
|---|---|---|
| Largest substantive theme | T2 Regional Governance & Policy (28.0%, 110 sessions) | C4 Open Space (21.9%, 86 sessions) |
| AI discourse | 29.5% of sessions (word-boundary "ai" | "algorithmic" | "algorithm") | C0 Data & AI Accountability (15.0%), C3 Civil Society Governance AI (13.7%) |
| Collective/movement focus | T4 Community Organizing & Grassroots (13.5%, 53 sessions) | C5 Collective Care & Movement (11.7%, 46 sessions, 80% Co-creating) |
| Legal/evidence cluster | T5 Legal Frameworks, Evidence & Platform Accountability (8.1%, 32 sessions) | C1 Legal Evidence & Journalism (7.4%, 29 sessions) |
| Youth & civic media | T7 Youth, Media & Narrative Resistance (9.9%, 39 sessions) | Concentrated in C4 Open Space (T7 k=12 rank-2 in C4) |
| Myanmar as outlier | T1 Civil Society Digital Security Capacity (2.3%, 9 sessions) | Mostly noise (HDBSCAN does not isolate it as a cluster) |
Disagreement note: The LDA proportion for AI discourse (29.5%) and HDBSCAN cluster proportions (C0 15% + C3 14%) are not additive — they measure different things. LDA measures topic probability weight; HDBSCAN measures cluster membership. The gap between 29.5% and ~29% combined reflects that not all sessions with "AI" in their content are sufficiently coherent to form a dense semantic cluster.
Top Sessions by Upvotes
Co-creating shared resources
| Votes | ID | Format | Title |
|---|---|---|---|
| 14 | #37330 | Panel | Confronting Online Hate and Digital Censorship in South and Southeast Asia |
| 12 | #37163 | Roundtable | From Risk to Resilience: Bringing Communities and Trainers Together for Digital Safety |
| 12 | #37058 | Panel | How AI is shaping political communications during elections |
| 11 | #37609 | Co-learning workshop | Brain-rot Activism: Strategy or Setback? |
| 11 | #37174 | Roundtable | Resourcing Digital Rights Advocacy in Southeast Asia |
Collaborative action across movements
| Votes | ID | Format | Title |
|---|---|---|---|
| 14 | #37015 | Co-learning workshop | Garbage In, Garbage Out: Exposing Gender Bias and Stereotypes in Large Language Models (LLMs) |
| 12 | #37129 | Panel | Engaging Big Tech in Southeast Asia: Strategies, Challenges, and Collective Leverage for Human Rights |
| 9 | #35073 | Panel | Myanmar Voices, Regional Support: Digital Security Peer Lab |
| 9 | #33009 | Panel | Open Tech Jam: Privacy-respecting, secure, and open digital tools for at-risk communities |
| 9 | #38333 | Ideation workshop | (Re)Imagining a Multistakeholder Model for Digital Platforms in ASEAN |
| 9 | #34922 | Panel | Your Boss is an Algorithm: Are You Playing or Being Played? |
Vote counts are a snapshot as of the scrape date (2026-03-29/30). DRAPAC voting is live; counts may have shifted since.
Methodology
Reproducibility
The complete pipeline is published as a standalone script:
A-PROJECTS/DRAPAC 26 Analysis/drapac_analysis.py
Run it with:
python3 drapac_analysis.py "~/Vault/D-ARCHIVES/DRAPAC 26 Sessions Submission/"
Dependencies:
pip install gensim scikit-learn umap-learn hdbscan scipy
The script produces: LDA topics (k=8 + k=12), HDBSCAN cluster labels, UMAP 2D embedding, per-cluster topic/TF-IDF profiles, and a Track × Cluster cross-tabulation.
Data Source
Vault path: ~/Vault/D-ARCHIVES/DRAPAC 26 Sessions Submission/*.md
Source URL: https://drap.ac/26/activities/?view=<ID>
Total files: 393 (as of 2026-03-30)
Content Preprocessing
Sessions are loaded from .md files. The Content field (session description body) is extracted via regex and tokenised:
- Lowercase — normalises all text
- Strip markdown —
#headings,*bold, URLs, numbers, table pipes - Stopword removal — Gensim
STOPWORDS+ 50+ custom terms (conference procedures: "session", "workshop", "panel"; region names: "asia", "pacific"; generic modifiers: "also", "however", "new", "used") - Filter tokens < 3 chars and pure-punctuation tokens
- Drop sessions with < 5 tokens (empty or near-empty descriptions)
Dictionary filtering: no_below=5 (must appear in ≥5 sessions), no_above=0.65 (removed if in >65% of sessions). This gives a vocabulary of ~1,894 terms for 393 sessions — sufficient for LDA without sparsity.
LDA Topic Modeling
Two runs:
| Model | num_topics |
Passes | Purpose |
|---|---|---|---|
lda8 |
k=8 |
40 | Primary — cleaner topics, main report table |
lda12 |
k=12 |
30 | HDBSCAN input — higher dim = better clustering |
Shared parameters:
alpha='auto' # learn per-document topic sparsity from data
eta='auto' # learn per-topic word sparsity from data
random_state=42 # reproducibility
Output: each session gets a probability distribution over all k topics. The "dominant topic" is the one with highest probability. Topic-word distributions are ranked lists used for human interpretation.
Doc-topic matrix shape: (393, k) — used as the embedding for HDBSCAN.
UMAP + HDBSCAN Clustering
UMAP reduces the lda12 doc-topic matrix (393 × 12) → (393 × 2) using cosine distance:
umap.UMAP(n_components=2, n_neighbors=15, min_dist=0.1,
metric='cosine', random_state=42)
Cosine distance is the right metric for probability distributions (cosine similarity of doc_topic[i] between sessions).
HDBSCAN clusters the 2D embedding density:
hdbscan.HDBSCAN(min_cluster_size=25, metric='euclidean',
cluster_selection_method='eom')
eom(Excess of Mass) selects clusters by density rather than enforcing a fixed thresholdmin_cluster_size=25chosen after testing{20, 25, 30}— gives 7 clusters + 15.3% noise
Session Metadata Extraction
The Track, Format, Track×Format, Countries, and Upvotes tables are extracted from the session .md files using regex + keyword matching. Complete standalone script (no external dependencies):
A-PROJECTS/DRAPAC 26 Analysis/drapac_metadata.py
Run it with:
python3 drapac_metadata.py ~/Vault/D-ARCHIVES/DRAPAC\ 26\ Sessions\ Submission/
Field Extraction
| Field | Regex | Notes |
|---|---|---|
| Session ID | r'-(\d+)\.md$' |
Basename filename |
| Upvotes | r'##\s*#\d+\s*\n+\[(\d+)\]' |
[N](url) link after heading |
| Track | r'^|\s*Track\s*|\s*(.+?)\s*|' |
Table row, normalised to Co-creating/Collaborative |
| Format | r'^|\s*Format\s*|\s*(.+?)\s*|' |
Table row, title-cased |
| Content | r'^|\s*Content\s*|\s*(.+?)(?=\n|)' |
Session body only — all theme analysis uses this |
| Organiser | r'^|\s*Organiser\s*|\s*(.+?)\s*|' |
Table row |
Track Normalisation
tm = re.search(r'^\|\s*Track\s*\|\s*(.+?)\s*\|', c, re.MULTILINE)
track = 'Unknown'
if tm:
t = tm.group(1).strip().lower()
if 'co-creat' in t: track = 'Co-creating' # Co-creating shared resources
elif 'collab' in t: track = 'Collaborative' # Collaborative action across movements
Format Extraction
fm = re.search(r'^\|\s*Format\s*\|\s*(.+?)\s*\|', c, re.MULTILINE)
fmt = fm.group(1).strip().title() if fm else 'Unknown'
Countries — Substring Keyword Search
Case-insensitive substring match across the entire file (Title + Organiser + Content). Sessions are counted once per matched country — duplicates within a file do not inflate counts.
COUNTRY_VARIANTS = {
'Indonesia': ['indonesia'],
'Philippines': ['philippines'],
'India': ['india'],
'Nepal': ['nepal'],
'Myanmar': ['myanmar', 'burma'],
'Bangladesh': ['bangladesh'],
'Pakistan': ['pakistan'],
'Malaysia': ['malaysia'],
'Sri Lanka': ['sri lanka'],
'Thailand': ['thailand'],
'Taiwan': ['taiwan'],
'Singapore': ['singapore'],
'Vietnam': ['vietnam'],
'China': ['china'],
'Cambodia': ['cambodia'],
'Korea': ['korea', 'south korea', 'north korea'],
'Japan': ['japan'],
'Australia': ['australia'],
'Pacific': ['pacific', 'asia pacific', 'asia-pacific', 'apac', 'oceania'],
'Papua': ['papua'],
'Timor': ['timor'],
'Laos': ['laos'],
}
def search_countries(text: str) -> list[str]:
"""Case-insensitive substring match. Returns unique countries found."""
text_lower = text.lower()
found = []
for country, variants in COUNTRY_VARIANTS.items():
for variant in variants:
if variant in text_lower: # substring — not word-boundary
found.append(country)
break
return found
# Usage: countries = search_countries(title + ' ' + organiser + ' ' + content)
Track × Format Cross-Tabulation
from collections import Counter, defaultdict
track_fmt = defaultdict(lambda: defaultdict(int))
for s in sessions:
track_fmt[s['track']][s['format']] += 1
Upvotes Extraction
m = re.search(r'##\s*#\d+\s*\n+\[(\d+)\]', content)
votes = int(m.group(1)) if m else 0
Aggregation
- Thematic Tracks and Session Formats tables:
Counteraggregation of extractedtrackandformatfields - Track × Format matrix:
defaultdictcross-tabulation - Countries:
Counterper country per track — sessions can appear in multiple country rows (sum of rows > total sessions)
Note: Country counts reflect unique sessions mentioning each country — a session mentioning Indonesia and Philippines counts once in each row. A session may appear in multiple country rows simultaneously (sum of country rows > total sessions).
Known Limitations
| Issue | Effect | Mitigation |
|---|---|---|
k is a chosen hyperparameter |
Topics depend on choice of k |
Ran k={8,12}; 8-topic was most interpretable |
| LDA is probabilistic | Same k with different seeds gives slightly different topics |
Fixed random_state=42 throughout |
| Session length (~232 words avg) limits LDA quality | Shorter documents = noisier topic distributions | Filtered dictionary extremes; 40 passes for k=8 |
HDBSCAN requires min_cluster_size tuning |
Different values produce different cluster counts | Tested mcs={20,25,30}; chose mcs=25 (7 clusters, 15% noise) |
| Stopword list may filter legitimate terms | Some thematic terms removed | Custom stoplist avoids removing domain-specific terms |
| Votes | Live snapshot from scrape date | Do not treat as current values |