-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathposts.json
More file actions
885 lines (885 loc) · 402 KB
/
posts.json
File metadata and controls
885 lines (885 loc) · 402 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
[
{
"date": "2026-02-28",
"title": "💡 TIL: Neurosymbolic AI - Bringing Reason Back Into Intelligence",
"url": "/posts/til-neurosymbolic-ai.html",
"content": "<p>\n<strong>TL;DR:</strong> Neurosymbolic AI combines neural networks (pattern recognition) with symbolic logic (structured reasoning) to create systems that can both recognise <em>and</em> understand. While neural networks alone can identify a cat but can’t explain why, and rule-based systems can reason but break when reality doesn’t fit the template, neurosymbolic approaches bridge this gap - with real-world applications like mediKanren already using this hybrid approach to discover novel medical treatments by combining logical inference over scientific literature with LLM capabilities.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nPedro Domingos’ <a href=\"https://en.wikipedia.org/wiki/The_Master_Algorithm\">The Master Algorithm</a> lays out five tribes of machine learning - symbolists, connectionists, evolutionaries, Bayesians, and analogisers - and argues that the ultimate goal is a single master algorithm that unifies them all. Neurosymbolic AI feels like a tangible step toward that vision, bridging at least two of those tribes: the symbolists and the connectionists.</p>\n<p>\nSymbolic AI first caught my eye when I read about Matt Might’s work on mediKanren - a reasoning engine that digests biomedical literature into logical relations and infers novel treatments by connecting disparate parts of the medical knowledge graph. Recently, a concise <a href=\"https://www.youtube.com/watch?v=ZfWDVO3rzeA\">video explainer on Neurosymbolic AI</a> helped crystallise why this hybrid approach matters, and how it connects to work like Might’s.</p>\n<h2>\nThe Problem: Recognition Without Understanding</h2>\n<p>\nThe video opens with an analogy that resonates: current AI is like a student who memorises every answer but doesn’t understand the material. Neural networks excel at pattern recognition - tagging photos, generating text - but can’t explain <em>why</em> they reach their conclusions. Show them a plastic plant and they’ll confidently call it real, because they’ve learned what plants <em>look like</em>, not what they <em>are</em>.</p>\n<p>\nConversely, classical symbolic AI reasons step by step - leaves plus stem equals plant - but freezes when a cactus shows up. It’s logic without intuition.</p>\n<h2>\nNeurosymbolic AI: Both Sides of the Coin</h2>\n<p>\nNeurosymbolic AI combines the neural (learning) with the symbolic (reasoning). The video’s stop sign example illustrates this well: the neural side detects shapes and colours, while the symbolic side applies rules like “<em>red and octagonal means stop sign</em>“. Even with stickers, lighting changes, or paint, the system still understands <em>why</em> a stop sign looks the way it does.</p>\n<p>\nImportantly, these systems can learn new rules. The whale example from the video demonstrates meta-learning: when a model that’s learned “<em>mammals have fur</em>“ encounters a whale, it can reason - whales give birth to live young and have lungs - and update its logic without retraining on millions of examples.</p>\n<h2>\nFrom Theory to Practice: mediKanren</h2>\n<p>\nMatt Might’s mediKanren provides a compelling real-world implementation of neurosymbolic ideas. Might, director of the Hugh Kaul Precision Medicine Institute at the University of Alabama at Birmingham, created mediKanren after his son Bertrand was diagnosed with a previously unknown rare disease (NGLY1 deficiency).</p>\n<p>\nmediKanren digests PubMed abstracts using NLP, extracts simple relations (X inhibits Y, Y causes Z), and layers logical inference on top to discover potential treatments by connecting findings across disparate papers. It’s essentially the neurosymbolic approach applied to medicine - neural methods for parsing natural language, symbolic reasoning for drawing logical conclusions.</p>\n<p>\nThe team is now developing mediKanren-GPT, which combines the symbolic, explainable approach of mediKanren with LLMs. As Might has noted, mediKanren can <em>verify</em> the assertions coming out of LLMs - if an LLM makes a claim, that claim can be passed to a symbolic AI for verification. This addresses one of the core weaknesses of pure neural approaches: the inability to guarantee factual correctness.</p>\n<h2>\nConclusion</h2>\n<p>\nThe neurosymbolic approach represents a path beyond the current limitations of both pure neural networks and rigid rule-based systems. What makes it particularly interesting isn’t just the theoretical elegance of combining pattern recognition with structured reasoning - it’s that systems like mediKanren are already applying this hybrid approach to discover medical treatments that no human researcher had connected.</p>\n<p>\nAs Domingos argued, the real breakthroughs will come from unifying the tribes of machine learning. Neurosymbolic AI is one of the most promising steps in that direction - logic and learning, working hand in hand.</p>\n<p>\nTIL that neurosymbolic AI isn’t the fringe approach I’d assumed it was. I’d known about its potential since reading about mediKanren, but the relative silence around it had me second-guessing whether it was gaining real traction. Seeing it presented as a growing, practical field - with real applications in science, finance, and law - was a welcome reminder that sometimes the quiet ideas are the ones worth paying attention to.</p>\n",
"tags": [
"til",
"ai",
"neurosymbolic",
"symbolic-ai",
"neural-network",
"reasoning",
"interpretability"
]
},
{
"date": "2026-02-28",
"title": "💡 TIL: I've Been Doing Spec-Driven Development Without Realising",
"url": "/posts/TIL-spec-driven-development.html",
"content": "<p>\n<strong>TL;DR:</strong> Watching a video on Spec-Driven Development, I realised I’ve been practising a form of it for the past year - starting from behaviour specifications and constraints before letting coding agents implement. Combined with Simon Willison’s concept of Agentic Engineering, this gave a name to an approach I’d developed through trial and error: directing AI agents with upfront planning rather than iterating through vague prompts.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nI stumbled across a video titled <a href=\"https://www.youtube.com/watch?v=mViFYTwWvcM\">Spec-Driven Development: AI Assisted Coding Explained</a> and had one of those “<em>oh, that’s what I’ve been doing</em>“ moments.</p>\n<h2>\nSpec-Driven Development</h2>\n<p>\nThe video contrasts <strong>vibe coding</strong> - prompt, generate, tweak, repeat - with <strong>spec-driven development</strong>, where you specify desired behaviour and constraints <em>before</em> any code gets written. The spec becomes a contract: requirements first, then a design document, then implementation. Nothing gets coded until the spec is approved. The presenter positions it as test-driven development and behaviour-driven development “on steroids”, where the spec becomes the primary artifact driving all downstream work.</p>\n<h2>\nMy Experience</h2>\n<p>\nI’ve been doing a form of this for the past few months. When working with coding agents, I usually start by planning - writing out what I want the system to do, the constraints, the expected behaviour - before letting the agent loose. Not always; sometimes a quick script just needs a quick prompt. But for anything substantial, the planning step has become instinctive.</p>\n<p>\nI wouldn’t call what I do vibe coding, though. Simon Willison’s term <a href=\"https://simonwillison.net/2026/Feb/23/agentic-engineering-patterns/\"><strong>agentic engineering</strong></a> captures it better - professional developers amplifying their existing experience with coding agents, rather than ignoring the code entirely. The spec-driven approach fits naturally within that: you’re not vibing, you’re <em>directing</em>.</p>\n<h2>\nConclusion</h2>\n<p>\nWhat I found validating about the video is the emphasis on reducing ambiguity. When you give an agent a spec rather than a vague prompt, you get consistent results. That matches my experience - the more precise my upfront planning, the fewer back-and-forth iterations I need.</p>\n<p>\nTIL there’s a name for what I’ve been doing.</p>\n",
"tags": [
"til",
"ai",
"agentic-engineering",
"spec-driven-development",
"productivity"
]
},
{
"date": "2026-02-21",
"title": "🛠️ Finding Balance: My Current AI Development Toolstack",
"url": "/posts/finding-balance-my-current-ai-development-toolstack.html",
"content": "<p>\n<strong>TL;DR:</strong> After experimenting with various AI assistants for coding and learning, I’ve discovered that each serves a distinct purpose rather than forming a complementary set. Claude Code excels at codebase exploration but sacrifices user control, Solveit offers deep understanding at the cost of speed, Lumo provides a superior privacy-focused chat interface, whilst GitHub Copilot can be useful for code review and GitHub Actions. Rising subscription costs make hardware investments like Mac Mini or AMD Strix Halo increasingly attractive for running open-weight models.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nOver the past year, I’ve experimented with multiple AI development tools, each with unique strengths and weaknesses. My goal has been to find a minimal, effective toolstack that covers all my needs whilst reducing subscription costs. What I’ve discovered is that each tool excels in specific domains rather than forming a truly complementary ecosystem.</p>\n<h2>\nMy Current Toolstack</h2>\n<h3>\nClaude Code (Used Daily)</h3>\n<p>\nClaude Code has become my primary workhorse for several key tasks:</p>\n<ul>\n <li>\n<strong>Codebase exploration</strong>: Navigating unfamiliar repositories and understanding other developers’ code </li>\n <li>\n<strong>Multi-file fixes</strong>: Making coordinated changes across multiple files </li>\n <li>\n<strong>Code documentation</strong>: Generating comprehensive documentation for existing code </li>\n <li>\n<strong>Article drafting</strong>: Creating initial outlines like this one before manual refinement </li>\n</ul>\n<p>\nWhile Claude Code excels at complex requirements and large codebases, its primary limitation is a lack of user control. The agentic AI takes charge of the coding process, often overcomplicating both code and text. I find myself regularly simplifying its output to match my preference for minimalist solutions.</p>\n<p>\nClaude Max x20 at <a href=\"https://techcrunch.com/2025/04/09/anthropic-rolls-out-a-200-per-month-claude-subscription/\">$200/month</a> feels steep when a capable miniPC running open-weight models -<a href=\"https://openhands.dev/blog/minimax-m2-5-open-weights-models-catch-up-to-claude\">likely approaching Sonnet capability within 6 months</a>- costs similar money upfront. Open-weight models <a href=\"https://epoch.ai/data-insights/open-weights-vs-closed-weights-models\">typically lag frontier models by six months</a>, yet <a href=\"https://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-sanctions/\">accelerated releases from Chinese research groups</a> suggest this gap may narrow. An <a href=\"https://wccftech.com/amd-ryzen-ai-max-395-strix-halo-mini-pc-tested-powerful-apu-up-to-140w-power-128-gb-variable-memory-igpu/\">AMD Strix Halo system</a> might prove worthwhile for experimenting with powerful open-weight models rather than maintaining expensive subscriptions. At <a href=\"https://www.tweaktown.com/news/103292/amd-ryzen-ai-max-395-strix-halo-apu-mini-pc-tested-up-to-140w-power-128gb-of-ram/index.html\">~140W power consumption</a>, even running 24/7, electricity costs remain manageable compared to subscription fees.</p>\n<h3>\nGitHub Copilot (Used Occasionally)</h3>\n<p>\nCopilot occupies an interesting position in my workflow:</p>\n<ul>\n <li>\n<strong>Inline code suggestions</strong>: Useful for repetitive patterns and common operations </li>\n <li>\n<strong>GitHub Actions workflows</strong>: Strong at suggesting CI/CD configurations </li>\n <li>\n<strong>Documentation generation</strong>: Reasonable at generating docstrings, doctests and comments </li>\n</ul>\n<p>\nHowever, I’ve found Copilot frequently gets in my way, offering suggestions when I don’t need them and sometimes requiring more effort to correct than to write from scratch. Of all my current tools, Copilot provides the least unique value given its overlap with other assistants.</p>\n<h3>\nProton’s Lumo (Used Occasionally)</h3>\n<p>\nLumo has become a great helping hand, with strong privacy guarantees:</p>\n<ul>\n <li>\n<strong>Refining ideas</strong>: Excellent conversational partner for brainstorming and iteration </li>\n <li>\n<strong>Small code snippets</strong>: Generates concise, practical solutions without overengineering </li>\n <li>\n<strong>ChatGPT alternative</strong>: A smooth, privacy-focused chat interface </li>\n</ul>\n<p>\nWhat sets Lumo apart is Proton’s commitment to privacy. Unlike many AI tools, Lumo operates with a strict no-logs policy and zero-access encryption, ensuring conversations remain confidential. </p>\n<p>\nI’m looking forward to the potential of Proton releasing an API for Lumo, which might eventually allow it to serve as a replacement for other AI services in some contexts. However, there’s no guarantee this will happen as the company may have different plans for its development.</p>\n<h3>\nSolveit (Used Daily)</h3>\n<p>\nI’m still exploring Solveit’s capabilities:</p>\n<ul>\n <li>\n<strong>Literate programming</strong>: Integrates documentation and code development seamlessly </li>\n <li>\n<strong>Learning new domains</strong>: Particularly strong for building Python projects from scratch </li>\n <li>\n<strong>Book chapter writing</strong>: Structured approach to technical content creation </li>\n <li>\n<strong>Close reading</strong>: Deep dive into research papers, books, blog posts </li>\n</ul>\n<p>\nWhat makes Solveit distinctive is its methodical approach based on George Polya’s classic problem-solving framework from his 1945 book “<em>How to Solveit</em>“. Instead of generating large blocks of code, Solveit encourages building solutions incrementally -typically one or two lines at a time- maintaining human agency throughout the process.</p>\n<p>\nUnlike Claude Code’s automated approach where the agentic AI takes control, Solveit keeps me firmly in the driver’s seat whilst still providing AI assistance. This results in deeper understanding and learning, albeit at the cost of development speed. The platform’s design as a “dialogue engineering” environment rather than just a code generator helps avoid the cognitive debt that can accumulate when over-relying on AI-generated content.</p>\n<p>\nI’m actively evaluating whether Solveit could become my primary IDE for production code contributions, though its design scope may not fully extend to this use case yet.</p>\n<h3>\nMistral Vibe (Evaluated)</h3>\n<p>\nI evaluated Mistral’s agentic AI CLI tool Mistral Vibe with Devstral-2 123B via API. The generous Le Chat Pro usage limits were welcome compared to Claude Pro’s increasingly restrictive quotas.</p>\n<p>\nMistral Vibe follows a similar approach to Claude Code -providing an agentic AI interface for coding tasks. The tool handled basic operations competently and showed decent understanding of project context. However, Devstral-2 feels noticeably weaker than Claude Sonnet and Opus in several key areas:</p>\n<ul>\n <li>\n<strong>Code quality</strong>: Generated solutions were functional but less elegant </li>\n <li>\n<strong>Context understanding</strong>: Struggled with complex multi-file relationships </li>\n <li>\n<strong>Problem solving</strong>: Required more iterations to reach satisfactory solutions </li>\n</ul>\n<p>\nWhile the cost advantages are compelling, the capability gap made it unsuitable as a primary replacement for Claude Code in my workflow.</p>\n<h2>\nFinding the Optimal Balance</h2>\n<p>\nAfter several months of experimentation, I’ve concluded that GitHub Copilot is the weakest link in my current stack. While I’ve got an annual subscription, I’ll limit its use to GitHub Actions and smaller GitHub issues.</p>\n<p>\nMy evaluation of Mistral Vibe reinforced this assessment -despite generous usage limits, Devstral-2’s capabilities lag noticeably behind Claude’s models, making it unsuitable as a primary replacement despite potential cost savings.</p>\n<p>\nRather than seeking an assortment of tools, I’ve found that each tool serves a distinct purpose with its own strengths and trade-offs:</p>\n<table>\n <thead>\n <tr>\n <th style=\"text-align: left;\">\nTask </th>\n <th style=\"text-align: left;\">\nCurrent Best Tool </th>\n <th style=\"text-align: left;\">\nKey Trade-off </th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <td style=\"text-align: left;\">\nCodebase exploration </td>\n <td style=\"text-align: left;\">\nClaude Code </td>\n <td style=\"text-align: left;\">\nSacrifices user control to agentic AI </td>\n </tr>\n <tr>\n <td style=\"text-align: left;\">\nMulti-file changes </td>\n <td style=\"text-align: left;\">\nClaude Code </td>\n <td style=\"text-align: left;\">\nMay overcomplicate solutions </td>\n </tr>\n <tr>\n <td style=\"text-align: left;\">\nChat interface/brainstorming </td>\n <td style=\"text-align: left;\">\nLumo </td>\n <td style=\"text-align: left;\">\nLimited code integration </td>\n </tr>\n <tr>\n <td style=\"text-align: left;\">\nPrivacy-focused interactions </td>\n <td style=\"text-align: left;\">\nLumo </td>\n <td style=\"text-align: left;\">\nNo API access (yet?) </td>\n </tr>\n <tr>\n <td style=\"text-align: left;\">\nDeep learning/controlled coding </td>\n <td style=\"text-align: left;\">\nSolveit </td>\n <td style=\"text-align: left;\">\nSlower development pace </td>\n </tr>\n <tr>\n <td style=\"text-align: left;\">\nGitHub specific tasks </td>\n <td style=\"text-align: left;\">\nGitHub Copilot </td>\n <td style=\"text-align: left;\">\nLimited unique value </td>\n </tr>\n <tr>\n <td style=\"text-align: left;\">\nProduction coding </td>\n <td style=\"text-align: left;\">\nStill evaluating </td>\n <td style=\"text-align: left;\">\nControl vs. speed </td>\n </tr>\n </tbody>\n</table>\n<p>\nThe bigger question may be sustainability. <a href=\"https://www.anthropic.com/pricing\">Claude Max’s $200/month cost</a> competes directly with hardware investments like an AMD Strix Halo or a Mac Mini that could run increasingly capable open-weight models. If <a href=\"https://fourweekmba.com/the-open-model-convergence-how-the-frontier-gap-collapsed-to-6-months/\">open-weight models continue closing the gap</a> with frontier models -<a href=\"https://aarambhdevhub.medium.com/open-source-ai-vs-paid-ai-for-coding-the-ultimate-2026-comparison-guide-ab2ba6813c1d\">potentially within 6 months</a>- local hardware might offer better long-term value than subscription services.</p>\n<p>\nI’m looking for a minimal, compact toolbox that covers my computational and learning requirements whilst remaining cost-effective as the AI landscape evolves.</p>\n<h2>\nThe Path Forward</h2>\n<p>\nEach tool occupies a different niche in my workflow:</p>\n<ol>\n <li>\n<strong>Claude Code</strong> for tasks where I need to understand large codebases or make coordinated changes across multiple files, accepting some loss of control to agentic AI </li>\n <li>\n<strong>Lumo</strong> for conversational interactions where privacy is paramount, especially when brainstorming sensitive topics </li>\n <li>\n<strong>Solveit</strong> for deeper learning experiences and maintaining complete control of the code creation process </li>\n <li>\n<strong>GitHub Copilot</strong> for GitHub-specific tasks until my subscription expires </li>\n</ol>\n<p>\nUnlike my initial assumption that tools would complement each other, I’ve discovered they serve different use cases with distinct trade-offs between control, privacy, speed, and depth of understanding.</p>\n<h2>\nConclusion</h2>\n<p>\nThe key insight from this exploration is that AI tools present distinct trade-offs rather than forming a complementary ecosystem. Each tool -Claude Code’s agentic automation, Lumo’s privacy-focused chat, Solveit’s methodical learning approach, and Copilot’s GitHub integration- serves specific use cases with inherent compromises between control, privacy, speed, and depth.</p>\n<p>\nMy goal is to narrow down to two tools that best address my needs, reducing both subscription costs and cognitive overhead. The challenge isn’t finding perfect complementarity, but identifying which specific trade-offs I’m willing to accept for different types of work as the AI landscape continues evolving. Currently the combination of Claude Code and Solveit seems to strike the best balance for my needs.</p>\n",
"tags": [
"toolchain",
"ai",
"productivity",
"minimal",
"claude",
"github-copilot",
"lumo",
"solve-it",
"mistral"
]
},
{
"date": "2025-10-21",
"title": "💡 TIL: Hybrid RAG - Combining the Best of Sparse and Dense Retrieval",
"url": "/posts/TIL-hybrid-rag.html",
"content": "<p>\n<strong>TL;DR:</strong> Retrieval Augmented Generation (RAG) uses three main retrieval strategies: (1) Sparse retrieval (50 years old) relies on keyword matching via TF-IDF/BM25- excellent for exact matches but poor with synonyms; (2) Dense retrieval (5-10 years old) uses vector embeddings to capture semantic meaning- better for natural language but misses rare terms; (3) Hybrid retrieval (2-3 years old) combines both approaches with fusion algorithms to merge results. Hybrid retrieval is now the gold standard, balancing precision, recall, and processing speed for modern RAG systems.</p>\n<!--more-->\n<h2>\nRAG Retrieval: The Key to Accurate AI Responses</h2>\n<p>\nThis post is based on a concise and informative video titled <a href=\"https://yewtu.be/watch?v=r0Dciuq0knU\">Hybrid RAG</a> from the IBM Technology YouTube channel. The video provides an excellent short introduction to what Hybrid RAG is.</p>\n<p>\nA RAG system’s effectiveness depends largely on its retrieval strategy- how it fetches information to feed into an LLM. The process works by:</p>\n<ol>\n <li>\nProcessing a user query </li>\n <li>\nRetrieving relevant chunks from a knowledge base </li>\n <li>\nFeeding those chunks to an LLM </li>\n</ol>\n<p>\nThe quality of retrieved information directly impacts the factual accuracy of the LLM’s responses.</p>\n<p>\n <img src=\"/images/Hybrid%20RAG.png\" alt=\"Visual comparison of Sparse, Dense, and Hybrid RAG approaches\">\n</p>\n<p>\nLet’s explore the three major retrieval strategies:</p>\n<h2>\nSparse Retrieval: The Classic Approach (50 years old)</h2>\n<p>\n<strong>How it works</strong>: Uses keyword matching through TF-IDF and BM25, counting term frequency in documents and scoring accordingly.</p>\n<p>\n<strong>Pros</strong>:</p>\n<ul>\n <li>\nSimple and fast implementation </li>\n <li>\nHighly scalable </li>\n <li>\nCost-effective (no embeddings required) </li>\n <li>\nEffective for domain-specific terminology </li>\n <li>\nCan sometimes outperform complex models for specialised terms </li>\n</ul>\n<p>\n<strong>Cons</strong>:</p>\n<ul>\n <li>\nPoor with synonyms and related concepts </li>\n <li>\nLimited contextual understanding </li>\n <li>\nStruggles with conceptual queries </li>\n</ul>\n<p>\n<strong>Best uses</strong>: Scenarios requiring exact wording- short queries, code search, log analysis, legal clauses.</p>\n<p>\n<strong>Implementations</strong>: Elasticsearch, Apache Lucene, Milvus</p>\n<h2>\nDense Retrieval: The Semantic Workhorse (5-10 years old)</h2>\n<p>\n<strong>How it works</strong>: Maps queries and documents into vector space using embeddings (often called “vector search”), finding results based on semantic similarity.</p>\n<p>\n<strong>Pros</strong>:</p>\n<ul>\n <li>\nStrong contextual understanding </li>\n <li>\nHandles synonyms and paraphrasing well </li>\n <li>\nFlexible for natural language queries </li>\n <li>\nCaptures content meaning effectively </li>\n</ul>\n<p>\n<strong>Cons</strong>:</p>\n<ul>\n <li>\nMisses rare terms and jargon </li>\n <li>\nLess effective for very short queries </li>\n <li>\nMore computationally intensive </li>\n <li>\nRequires domain adaptation </li>\n</ul>\n<p>\n<strong>Best uses</strong>: Chatbots, customer service, research over unstructured knowledge bases.</p>\n<p>\n<strong>Implementations</strong>: Meta’s FAISS, JVector</p>\n<h2>\nHybrid Retrieval: The Current State of the Art (2-3 years old)</h2>\n<p>\n<strong>How it works</strong>: Combines vector-based and keyword-based search, processing queries through both methods and merging results.</p>\n<p>\n<strong>Pros</strong>:</p>\n<ul>\n <li>\nLeverages strengths of both approaches </li>\n <li>\nOutperforms dense-only retrieval in benchmarks </li>\n <li>\nImproves precision and recall metrics </li>\n <li>\nHandles both semantics and rare terms </li>\n</ul>\n<p>\n<strong>Fusion algorithms</strong>:</p>\n<ul>\n <li>\nWeighted sum (e.g., 70% dense, 30% sparse) </li>\n <li>\nReciprocal Ranked Fusion (RRF), merging based on ranked positions </li>\n</ul>\n<p>\n<strong>Best uses</strong>: Specialised domains (legal, technical, medical) and general-purpose retrieval requiring high accuracy.</p>\n<p>\n<strong>Implementations</strong>: Elasticsearch, Milvus, Weaviate, DataStax Astra DB</p>\n<h2>\nWhy Hybrid Retrieval Leads the Pack</h2>\n<p>\nIf sparse retrieval is fast but literal, and dense retrieval is contextually aware but misses specific terms, hybrid retrieval offers the best combination:</p>\n<ol>\n <li>\n<strong>Complementary strengths</strong>: Semantic matching for concepts, keyword matching for critical terms </li>\n <li>\n<strong>Balanced performance</strong>: Optimises for speed, precision, and recall </li>\n <li>\n<strong>Adaptability</strong>: Works across different domains and query types </li>\n <li>\n<strong>Improved accuracy</strong>: Consistently outperforms single-method approaches </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nRetrieval strategies have evolved from simple keyword matching to sophisticated semantic understanding, with hybrid approaches now delivering superior results.</p>\n<p>\nFor RAG system developers today, hybrid retrieval offers the most balanced approach- combining the precision of keyword search with the contextual understanding of vector embeddings in a unified solution.</p>\n<p>\nThis TIL is based on the excellent explanation in IBM Technology’s video on Hybrid RAG, that’s worth your time in my opinion.</p>\n",
"tags": [
"til",
"rag",
"llm",
"retrieval",
"ai"
]
},
{
"date": "2025-10-17",
"title": "💡 TIL: Claude Skills - Modular AI Capabilities with Minimal Token Cost",
"url": "/posts/TIL-claude-skills.html",
"content": "<p>\n<strong>TL;DR:</strong> Claude Skills extend capabilities through modular instruction sets that load dynamically when needed. Using YAML frontmatter summaries (~dozens of tokens), skills keep full implementations ready for relevant tasks while minimizing overhead- enabling everything from document creation to financial analysis without performance degradation.</p>\n<!--more-->\n<h2>\nWhat Are Claude Skills?</h2>\n<p>\n<a href=\"https://www.anthropic.com/news/skills\">Anthropic’s Claude Skills</a> are folders containing instructions, scripts, and resources that extend the AI’s capabilities for specialized tasks. Claude scans available skills and loads only necessary components when relevant to the current task.</p>\n<p>\nKey characteristics:</p>\n<ul>\n <li>\n<strong>Composable</strong>: Multiple skills work together automatically </li>\n <li>\n<strong>Portable</strong>: Same format works across Claude applications, Claude Code, and API </li>\n <li>\n<strong>Efficient</strong>: Uses YAML frontmatter summaries consuming only dozens of tokens </li>\n <li>\n<strong>Powerful</strong>: Includes executable code for reliable task execution </li>\n</ul>\n<h2>\nTechnical Implementation</h2>\n<p>\nThe efficiency of Claude Skills comes from their implementation:</p>\n<ol>\n <li>\n <p>\nSkills use YAML frontmatter summaries in Markdown files <a href=\"https://simonwillison.net/2025/Oct/16/claude-skills/\">as described by Simon Willison</a> </p>\n <ul>\n <li>\n<em>YAML frontmatter</em>: Structured metadata (delimited by triple dashes <code class=\"inline\">---</code>) at the beginning of Markdown files containing required <code class=\"inline\">name</code> and <code class=\"inline\">description</code> fields that Claude scans at startup to determine which skills are relevant to a task </li>\n </ul>\n </li>\n <li>\n <p>\nThis approach dramatically reduces token overhead compared to alternatives like Model Context Protocol, which “<em>famously consumes tens of thousands of tokens</em>“ per Willison </p>\n </li>\n <li>\n <p>\nCode execution occurs in a secure environment </p>\n </li>\n</ol>\n<p>\nWillison’s analysis of a Slack GIF creator skill demonstrates that skills can import helper modules (like <code class=\"inline\">GIFBuilder</code>), use validation functions (e.g., <code class=\"inline\">check_slack_size()</code>), and save outputs to designated locations such as <code class=\"inline\">/mnt/user-data/outputs/</code>.</p>\n<h2>\nHow to Use Skills</h2>\n<p>\nCurrently available to Pro, Max, Team, and Enterprise users on Claude.ai:</p>\n<ol>\n <li>\n<strong>Pre-built Skills</strong>: Document creation (Excel, PowerPoint, Word, PDFs) </li>\n <li>\n<strong>Custom Skills</strong>: Developers can create these via the <code class=\"inline\">/v1/skills</code> API endpoint </li>\n</ol>\n<p>\nUsing a skill requires simply asking Claude to perform a task in the skill’s domain- Claude identifies and loads the relevant skill automatically.</p>\n<h2>\nApplications</h2>\n<p>\nReal-world applications include:</p>\n<ol>\n <li>\n<strong>Document Processing</strong>: Create and modify Excel spreadsheets, PowerPoint presentations, PDFs </li>\n <li>\n<strong>Data Analysis</strong>: Perform financial calculations and visualizations with specialized techniques </li>\n <li>\n<strong>Workflow Automation</strong>: <a href=\"https://www.anthropic.com/news/skills\">Companies report</a> 83% time reduction on specialized tasks </li>\n</ol>\n<p>\nClaude Skills represent an important evolution in AI assistant capabilities, enabling specialized task performance while maintaining core performance across interfaces and applications.</p>\n",
"tags": [
"til",
"ai",
"claude",
"llm",
"productivity"
]
},
{
"date": "2025-10-02",
"title": "💡 TIL: Advice for Mid-Career AI Researchers",
"url": "/posts/TIL-advice-for-midcareer-ai-researchers.html",
"content": "<p>\n<strong>TL;DR:</strong> Leading AI researchers emphasise practical coding skills, early access to GPUs, building rather than studying, public visibility, finding your unique perspective, and continuous learning. The most crucial insight: real breakthroughs come from researchers who ignore consensus, think for themselves, and tackle hard problems. Most encouragingly, they stress that anyone with determination can become an expert in this rapidly evolving field.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nIn a field evolving as rapidly as artificial intelligence, mid-career researchers often find themselves at a crossroads, balancing established expertise with the need to adapt to new paradigms. A recent compilation of advice from leading AI researchers and practitioners offers valuable insights for navigating this challenging terrain. These perspectives from industry leaders at organisations like OpenAI, Anthropic, Google DeepMind, and others reveal common themes about what truly matters for success in AI research today- with surprising emphasis on coding skills, infrastructure knowledge, and the courage to challenge consensus.</p>\n<h2>\nKey Advice from Industry Leaders</h2>\n<p>\nBelow is <a href=\"https://xcancel.com/chrisbarber/status/1973405958786429285\">a collection of insights</a> from leading AI researchers and practitioners, in response to the question: “<em>What career advice do you have for AI researchers, or what you wish you’d learned earlier?</em>“:</p>\n<h3>\nDevelop Strong Coding Skills</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/jeremyphoward\">@jeremyphoward</a>, > <a href=\"https://xcancel.com/answerdotai\">@answerdotai</a>, > <a href=\"https://xcancel.com/fastdotai\">@fastdotai</a></strong> Mid-career AI researchers (in > fact all levels!): focus on becoming really good coders. Learn to replicate > interesting research papers from scratch. Code is the medium we use to > experiment, so if you’re better at it, you can run more complex and creative > experiments more quickly. </p>\n</blockquote>\n<h3>\nPrioritise GPU Access</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/jacobmbuckman\">@jacobmbuckman</a>, Manifest AI</strong> Get as > close to as many gpus as early as possible. Almost nothing else you could do > is higher value. </p>\n</blockquote>\n<h3>\nBuild, Don’t Just Study</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/rronak_\">@rronak_</a>, Google DeepMind</strong> Stop studying, > build. Go one layer deeper into the infra than feels comfortable, since that’s > where the value is. If you’re on Langchain, write the agent loop yourself; if > you’re on Verl, write the pytorch yourself; If you’re on Megatron, write the > cuda kernels yourself. </p>\n</blockquote>\n<h3>\nEstablish Your Public Presence</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/finbarrtimbers\">@finbarrtimbers</a>, Ai2</strong> People need to > know you exist to give you opportunities. Write about interesting ideas you > have or things you are thinking about. There is an extreme hunger for > “interesting lunch conversation at DeepMind” level content (not hype boi > threads, not paper level technical). </p>\n</blockquote>\n<h3>\nFind Your Unique Perspective</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/_arohan_\">@<em>arohan</em></a>, Anthropic</strong> > > 1. Perspective matters more than novelty, many so-called ‘solved’ problems > still hide unsolved challenges in the details. Don’t dismiss anything as > trivial; breakthroughs are hidden in plain sight. > 2. Don’t worry about pedigree or fitting in. Hinton bet on neural networks > when the field dismissed them. The real breakthroughs come from researchers > who ignore the consensus, think for themselves, and tackle hard problems- > and that researcher can be you. </p>\n</blockquote>\n<h3>\nInvest in Foundational Knowledge</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/danielhanchen\">@danielhanchen</a>, Unsloth AI</strong> I would > definitely watch MIT, Stanford videos much much earlier- > <a href=\"https://yewtu.be/playlist?list=PLoROMvodv4rOmsNzYBMe0gJY2XS8AQg16\">CS231N</a>, > do <a href=\"https://course.fast.ai/\">FastAI courses</a>, > <a href=\"https://yewtu.be/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI\">MIT’s AI course</a>, > Gilbert Strang’s courses + > <a href=\"https://yewtu.be/playlist?list=PLULjW8y9XZKiTBlTFPVDLebgtctNZCgG-\">CS229</a>. </p>\n</blockquote>\n<p>\n<em>Note: For readers interested in Gilbert Strang’s courses, some excellent options include <a href=\"https://yewtu.be/playlist?list=PLE7DDD91010BC51F8\">MIT 18.06 Linear Algebra</a>, <a href=\"https://yewtu.be/playlist?list=PLBE9407EA64E2C318\">Highlights of Calculus</a>, <a href=\"https://yewtu.be/playlist?list=PLUl4u3cNGP63oMNUHXqIUcrkS2PivhN3k\">MIT 18.065 Matrix Methods</a>, <a href=\"https://yewtu.be/playlist?list=PLUl4u3cNGP61iQEFiWLE21EJCxwmWvvek\">A Vision of Linear Algebra</a>, and <a href=\"https://yewtu.be/watch?v=UcWsDwg1XwM&list=PLBE9407EA64E2C318&index=1\">Big Picture of Calculus</a>.</em></p>\n<h3>\nTake Action and Be Selective</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/BlackHC\">@BlackHC</a>, Google DeepMind</strong> Something I tell > people: Just try and do things, and if you’re in a place where you can’t > innovate or learn, figure out whether to switch. Everything compounds, so it > pays off long-term to be picky. Compromise compensation for role fit if > necessary. Something I wish I had internalized earlier: When in doubt, double > down and work hard. Nothing builds confidence and motivation like working > harder and getting results. </p>\n</blockquote>\n<h3>\nConnect with the Research Community</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/TianbaoX\">@TianbaoX</a>, OpenAI</strong> I wish I had learned > earlier how important it is to actively exchange ideas with others and stay > close to the frontier- AI is a fast-moving field, and working in isolation > rarely leads to impact. Because scaling laws hold so strongly, incremental > work often gets washed out; the real breakthroughs come from daring to imagine > and build something fundamentally new. </p>\n</blockquote>\n<h3>\nSeek Fulfilling Work</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/sandersted\">@sandersted</a>, OpenAI</strong> If you’re unhappy in > a job, look for a better one. </p>\n</blockquote>\n<h3>\nRemember Expertise Is Accessible</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/rolandgvc\">@rolandgvc</a>, xAI</strong> You’re 2 weeks away from > catching up to the state of the art. It’s never been easier to become an > expert in anything. </p>\n</blockquote>\n<h3>\nStay Adaptable</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/angli_ai\">@angli_ai</a>, Simular AI</strong> The opportunities > often come from places you couldn’t have imagined- stay curious, adapt fast, > and be willing to reinvent. </p>\n</blockquote>\n<h3>\nFocus on Legibility</h3>\n<blockquote>\n <p>\n<strong><a href=\"https://xcancel.com/alexeyguzey\">@alexeyguzey</a>, OpenAI</strong> To get a job at a > big company you need to be legible to a big company. </p>\n</blockquote>\n<h2>\nConclusion</h2>\n<p>\nThese insights reveal a significant shift in what drives success in modern AI research. The traditional academic path has given way to a more hands-on, build-focused approach where practical skills and independent thinking create outsized impact.</p>\n<p>\nWhat’s especially hopeful is the democratisation of knowledge underlined in many of these recommendations, i.e. expertise in AI is more accessible than ever before, and that anyone with determination can meaningfully contribute regardless of institutional affiliation.</p>\n<p>\nPerhaps the most compelling theme is the emphasis on challenging established thinking. This mindset, combined with technical depth and community engagement, seems to be the true differentiator in today’s AI research landscape.</p>\n",
"tags": [
"til",
"ai",
"research",
"career",
"learning",
"advice",
"perspective",
"breakthrough"
]
},
{
"date": "2025-09-25",
"title": "🦀 Transitioning from Python to Rust: A Minimalist Approach",
"url": "/posts/transitioning-from-python-to-rust-for-ai.html",
"content": "<p>\n<strong>TL;DR:</strong> Moving from Python to Rust for AI work requires a phased approach focusing on self-contained utilities first, leveraging PyO3 for hybrid integration, and adopting a minimal subset of Rust features before expanding. This strategy maintains productivity while gradually unlocking Rust’s type safety, performance, and cross-platform deployment advantages.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nAfter previously discussing the potential of doing AI and Data Science with <a href=\"https://ai-mindset.github.io/posts/deno.html\">Deno</a> or <a href=\"https://ai-mindset.github.io/posts/go-pragmatic-modern-development.html\">Go</a>, I’ve found Rust to be a compelling alternative, offering an ecosystem that covers my needs, memory safety without garbage collection, and a single-binary deployment model.\\ Four Rust libraries, namely</p>\n<ul>\n <li>\n<a href=\"https://rig.rs/\">Rig</a> for LLM applications </li>\n <li>\n<a href=\"https://docs.rs/ndarray/\">ndarray</a> for linear algebra </li>\n <li>\n<a href=\"https://plotters-rs.github.io/home/\">plotters</a> for visualisation </li>\n <li>\n<a href=\"https://docs.pola.rs/\">Polars</a> for DataFrames\\ <br>\nalready cover 90+% of an AI Engineer’s and a Data Scientist’s needs. </li>\n</ul>\n<h2>\nPhased Migration Strategy</h2>\n<p>\n<strong>1. Start with small, self-contained utilities</strong></p>\n<ul>\n <li>\nBegin by rewriting simple command-line tools or utilities </li>\n <li>\nFocus on pure functions with clear inputs/outputs </li>\n <li>\nExamples: data processors, validators, or simple APIs </li>\n</ul>\n<p>\n<strong>2. Learn incrementally through practical patterns</strong></p>\n<pre><code class=\"rust\">// Python:\ndef process_data(items):\n return [x * 2 for x in items if x > 0]\n\n// Rust equivalent:\nfn process_data(items: &[i32]) -> Vec<i32> {\n items.iter().filter(|x| **x > 0).map(|x| x * 2).collect()\n}</code></pre>\n<p>\n<strong>3. Adopt a hybrid approach during transition</strong></p>\n<ul>\n <li>\nUse <a href=\"https://pyo3.rs/\">PyO3</a> to call your new Rust code from existing Python </li>\n <li>\nGradually replace performance-critical components first </li>\n <li>\nKeep Python for rapid prototyping until comfortable with Rust </li>\n</ul>\n<p>\n<strong>4. Leverage familiar concepts across languages</strong></p>\n<ul>\n <li>\nRust’s iterators ≈ Python’s generators </li>\n <li>\nRust’s closures ≈ Python’s lambda functions </li>\n <li>\nRust’s Option/Result ≈ Python’s Optional/try-except </li>\n</ul>\n<h2>\nKeeping Rust Simple and Robust</h2>\n<p>\n<strong>1. Start with a subset of Rust features</strong></p>\n<ul>\n <li>\nFocus on structs, enums, and basic pattern matching </li>\n <li>\nDefer learning advanced traits, lifetimes, and generics </li>\n <li>\nUse <code class=\"inline\">#[derive]</code> macros to avoid boilerplate </li>\n</ul>\n<p>\n<strong>2. Adopt consistent patterns</strong></p>\n<pre><code class=\"rust\">// Prefer simple error handling patterns\nfn get_user(id: u64) -> Result<User, Error> {\n let user = db.find_user(id)?; // Early return on error\n Ok(user)\n}</code></pre>\n<p>\n<strong>3. Minimise complexity with good defaults</strong></p>\n<ul>\n <li>\nUse <code class=\"inline\">String</code> over <code class=\"inline\">&str</code> for return values until comfortable with lifetimes </li>\n <li>\nStart with <code class=\"inline\">Vec<T></code> before learning specialised collections </li>\n <li>\nPrefer <code class=\"inline\">.clone()</code> initially where ownership is complex </li>\n</ul>\n<p>\n<strong>4. Focus on idiomatic Rust patterns</strong></p>\n<ul>\n <li>\nPrefer composition over inheritance </li>\n <li>\nUse enums for representing state </li>\n <li>\nLeverage the type system to make invalid states unrepresentable </li>\n</ul>\n<p>\n<strong>5. Practical tooling setup</strong></p>\n<ul>\n <li>\nInstall <code class=\"inline\">rust-analyzer</code> for your Editor / IDE of choice </li>\n <li>\nUse <code class=\"inline\">clippy</code> to learn idiomatic Rust: <code class=\"inline\">cargo clippy</code> </li>\n <li>\nAdopt <code class=\"inline\">cargo fmt</code> for consistent formatting </li>\n</ul>\n<p>\nThis approach prioritises practical learning over theoretical completeness, allowing you to become productive quickly while gradually adopting Rust’s more powerful features as needed.</p>\n<h2>\nConclusion</h2>\n<p>\nMigrating from Python to Rust can offer considerable long-term benefits, including a cohesite ecosystem, native performance, and streamlined deployment, without requiring complete rewrites. Following a gradual migration path, leveraging hybrid integration, and deliberately limiting the developer’s initial exposure to Rust’s complexity, one can maintain productivity while acquiring experience. This approach can lead to a minimalist software development cycle that will result in increasingly robust software over time.</p>\n",
"tags": [
"rust",
"python",
"ai-engineering",
"data-science",
"migration",
"type-safety",
"performance",
"productivity",
"software-minimalism"
]
},
{
"date": "2025-08-26",
"title": "💡 TIL: Incremental AI Problem-Solving with Solveit",
"url": "/posts/TIL-solveit.html",
"content": "<p>\n<strong>TL;DR:</strong> Answer.ai’s Solveit approach mitigates LLM deterioration by breaking tasks into small steps, editing AI responses, and providing curated context, addressing three key issues: RLHF-trained overenthusiasm, autoregressive decline, and training data flaws.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nA student from fast.ai’s “<a href=\"http://solveit.fast.ai/\">Solve It With Code</a>“ course documented three LLM properties that cause deteriorating AI responses and corresponding mitigation techniques. The course, led by <a href=\"https://nitter.poast.org/jeremyphoward\">Jeremy Howard</a> and <a href=\"https://nitter.poast.org/johnowhitaker\">Johno Whitaker</a>, focuses on transforming problematic AI interactions into learning experiences through systematic problem-solving.</p>\n<p>\nThe approach addresses what the author terms the “deteriorating response pattern” -where AI tools initially appear helpful but produce increasingly broken code through subsequent iterations. The common scenario: Request a weather app from ChatGPT, receive 100 lines of code that doesn’t work, request fixes, encounter additional bugs. This occurs due to fundamental LLM properties, not implementation flaws.</p>\n<h2>\nThe Three Properties & Solutions</h2>\n<ol>\n <li>\n<em>RLHF creates overly eager helpers</em> </li>\n</ol>\n<p>\nIssue: Human raters prefer complete responses, so models provide overwhelming amounts of information at once Solution: Work in small steps, ask clarifying questions first Based on Pólya’s problem-solving framework: understand → plan → implement → review</p>\n<ol start=\"2\">\n <li>\n<em>Autoregression leads to deterioration</em> </li>\n</ol>\n<p>\nIssue: Responses degrade over long conversations as models revert to mediocre training patterns Solution: Edit the LLM’s responses to shape better patterns, pre-fill outputs, use examples This involves rewriting AI responses to match preferred style, then using those as context for subsequent interactions</p>\n<ol start=\"3\">\n <li>\n<em>Training data is flawed/outdated</em> </li>\n</ol>\n<p>\nIssue: Hallucinations and outdated information from lossy compression of training data Solution: “Jeremy RAG”- manually curating relevant context rather than relying on automated retrieval systems Tools like <a href=\"https://github.com/AnswerDotAI/contextkit\">contextkit</a> enable inclusion of specific documentation, followed by verification that the LLM correctly interprets the provided context</p>\n<h2>\nApplication to Modern AI Systems</h2>\n<p>\nThe methodology remains relevant for reasoning models like Claude Code or OpenAI’s Deep Research. The primary challenge isn’t that models cannot answer questions, but rather that users often don’t know which questions to ask initially.</p>\n<p>\nJeremy connected this to Eric Ries’ <a href=\"https://theleanstartup.com/principles\">Lean Startup methodology</a>: working in small steps enables adaptation of thinking and refinement of the actual question being posed (paraphrased from the original).</p>\n<h2>\nConclusion</h2>\n<p>\nThe Solveit approach transforms problematic AI interactions into learning experiences through iterative, step-by-step problem-solving where each stage builds understanding. By breaking down complex tasks and maintaining control over the conversation flow, users can achieve more reliable results with AI assistants.</p>\n<p>\n<em>Note: Solveit remains unreleased, but these principles apply to existing AI tools.</em></p>\n<p>\nDo you find that decomposing complex problems into smaller components reveals different requirements than initially anticipated?</p>\n",
"tags": [
"til",
"fast-ai",
"answer-ai",
"solveit",
"ai",
"best-practices",
"llm",
"performance",
"productivity",
"prompt-engineering"
]
},
{
"date": "2025-07-03",
"title": "💡 TIL: Engineering Prompts Double as Human Checklists",
"url": "/posts/TIL-prompts-as-human-checklists.html",
"content": "<p>\n<strong>TL;DR:</strong> Well-crafted AI system prompts, like those in Microsoft’s VSCode Copilot Chat extension, serve as excellent process documentation and step-by-step checklists that human developers can follow to improve their own workflows and debugging methodologies.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nI was exploring Microsoft’s recently open-sourced <a href=\"https://github.com/microsoft/vscode-copilot-chat/\">VSCode Copilot Chat extension</a> codebase when I noticed something interesting: the prompts that power AI coding assistants make excellent checklists for human developers too.</p>\n<h2>\nEngineering Prompts as Process Documentation</h2>\n<p>\nTake this <a href=\"https://github.com/microsoft/vscode-copilot-chat/blob/main/src/extension/prompts/node/agent/agentInstructions.tsx#L197\">agent instruction prompt</a> for example; it’s essentially a 24-step debugging methodology distilled from countless hours of human engineering experience:</p>\n<ol>\n <li>\nInitialize Git and explore the repository structure </li>\n <li>\nCreate a reproduction script to confirm the issue </li>\n <li>\nExecute the script to document the exact error </li>\n <li>\nAnalyse the root cause </li>\n <li>\nRead relevant code blocks before making changes </li>\n <li>\nDevelop comprehensive test cases </li>\n <li>\nStage files in Git before editing </li>\n <li>\nApply fixes iteratively… </li>\n</ol>\n<p>\nAnd so on. Each step represents a best practice that seasoned developers follow instinctively.</p>\n<p>\nThe <a href=\"https://github.com/microsoft/vscode-copilot-chat/blob/40d039d8e08c2d17435a2e65846120c394d0727b/src/extension/xtab/common/promptCrafting.ts#L34\">system prompt template</a> is equally instructive. It emphasises context analysis, consistency, and understanding developer intent before suggesting changes.</p>\n<p>\nWhat’s brilliant is that these prompts aren’t just instructions for AI, they’re codified human expertise. When we craft prompts for AI systems, we’re essentially documenting our own thought processes and best practices. The better the prompt, the better the human process it represents.</p>\n<h2>\nConclusion</h2>\n<p>\nNext time you’re debugging a tricky issue or refactoring complex code, consider following the same systematic approach these AI prompts encourage. After all, good prompts are just good processes made explicit.</p>\n",
"tags": [
"ai",
"best-practices",
"prompt-engineering",
"system-prompts",
"code-quality",
"productivity",
"til",
"debugging"
]
},
{
"date": "2025-03-28",
"title": "🔨 REWORK",
"url": "/posts/rework-the-art-of-working-smarter.html",
"content": "<p>\n<strong>TL;DR:</strong> Basecamp founders challenge conventional business wisdom in their book “Rework,” advocating for simplicity, constraints, sustainable work hours, and focused execution over endless planning, rapid growth, and workaholism- presenting practical principles for building profitable, sustainable businesses with minimal resources.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThe traditional approach to business often involves comprehensive planning, rapid growth, long hours, and complex processes. But is this truly the most effective way to succeed? <a href=\"https://world.hey.com/jason\">Jason Fried</a> and <a href=\"https://world.hey.com/david\">David Heinemeier Hansson</a>, founders of <a href=\"https://en.wikipedia.org/w/index.php?title=Basecamp_(company)&redirect=no\">Basecamp</a> (now <a href=\"https://en.wikipedia.org/wiki/37signals\">37signals</a>), challenge these conventional notions in their influential book “<a href=\"https://basecamp.com/books/rework\">Rework</a>“. Published in 2010, this manifesto presents an alternative philosophy for building successful businesses in the digital age -one that emphasises simplicity, efficiency, and balance. Drawing from their experience creating profitable web applications with a small team, Fried and Hansson offer practical insights for entrepreneurs and companies of all sizes. Their approach advocates working smarter rather than harder, focusing on what truly matters, and challenging business orthodoxy at every turn.</p>\n<h2>\nFoundational Principles of “Rework”</h2>\n<h3>\nEmbrace Simplicity and Constraints</h3>\n<p>\nFried and Hansson consistently emphasise that constraints aren’t limitations but advantages. With limited resources, you’re forced to focus on what’s essential:</p>\n<ul>\n <li>\n<strong>Build half a product, not a half-ar🤬ed product</strong>: It’s better to do fewer </li>\n <li>\n<strong>Embrace constraints</strong>: Limited time, budget, or people can spark creativity </li>\n <li>\n<strong>Underdo your competition</strong>: Instead of adding more features than <br>\nthings exceptionally well than to attempt everything poorly. Quality trumps quantity. and force efficiency. They make you focus on doing more with less. competitors, focus on solving core problems elegantly. Simplicity is a competitive advantage. </li>\n</ul>\n<p>\nThe authors point to specific examples, such as how Basecamp launched without billing functionality (adding it 30 days later) and how the Flip video camera succeeded by deliberately omitting features that competitors deemed essential.</p>\n<h3>\nChallenge Traditional Business Thinking</h3>\n<p>\n“Rework” consistently questions business conventions that many take for granted:</p>\n<ul>\n <li>\n<strong>Planning is guessing</strong>: Detailed long-term business plans are often </li>\n <li>\n<strong>Working long hours is counterproductive</strong>: “Workaholism” leads to burnout </li>\n <li>\n<strong>Growth isn’t always good</strong>: The authors argue against the obsession with </li>\n <li>\n<strong>Skip the “rock stars”</strong>: Instead of obsessing over hiring “ninjas” or “rock <br>\nexercises in fiction. Instead, make decisions just in time with the most current information available. and mediocre output. Productivity isn’t about hours worked but about focused, quality work. expansion, suggesting companies find their “right size” and focus on sustainability rather than constant growth. stars,” create an environment where ordinary people can do extraordinary work. </li>\n</ul>\n<h3>\nFocus on Action Over Discussion</h3>\n<p>\nA central theme in “Rework” is the importance of creating rather than just talking about creating:</p>\n<ul>\n <li>\n<strong>Start making something</strong>: Ideas are worthless without execution. The world </li>\n <li>\n<strong>Launch now</strong>: Perfection is unattainable; get your product out quickly and </li>\n <li>\n<strong>Meetings are toxic</strong>: They interrupt productivity, waste collective time, <br>\nis filled with people who “had that idea first” but never acted on it. iterate based on real feedback rather than assumptions. and often accomplish little. Minimise them ruthlessly. </li>\n</ul>\n<p>\nThe authors illustrate this with their own experience building Basecamp, launching quickly with core functionality and improving based on actual customer feedback rather than theoretical market research.</p>\n<h3>\nBuild an Audience-Focused Business</h3>\n<p>\nFried and Hansson outline a customer-centric approach to business development:</p>\n<ul>\n <li>\n<strong>Out-teach your competition</strong>: Share knowledge generously through blogs, </li>\n <li>\n<strong>Build an audience</strong>: Develop a following of people interested in what you </li>\n <li>\n<strong>Emulate chefs</strong>: Like celebrity chefs who share their recipes freely, <br>\narticles, and tutorials. Teaching establishes authority and builds trust with potential customers. have to say. When you launch products, you’ll already have an engaged audience. sharing your expertise doesn’t diminish your business -it enhances it. </li>\n</ul>\n<p>\nTheir company blog, Signal vs. Noise, exemplifies this approach, having built an audience of over 100,000 daily readers who became a natural customer base.</p>\n<h3>\nCreate a Sustainable Work Culture</h3>\n<p>\nThe authors advocate for work environments that prioritise sustainability over burnout:</p>\n<ul>\n <li>\n<strong>Send people home at 5</strong>: Reasonable working hours increase per-hour </li>\n <li>\n<strong>Avoid policies that treat people like children</strong>: Trust adults to manage </li>\n <li>\n<strong>Avoid unnecessary formality</strong>: Communicate in a human voice rather than <br>\nproductivity and lead to more creative solutions. their time and make good decisions. corporate-speak. Sound like yourself, not like a faceless entity. </li>\n</ul>\n<h2>\nPotential Limitations</h2>\n<p>\nWhile “Rework” offers valuable counter-conventional wisdom, some of its approaches may not suit all business contexts. The authors’ philosophy works particularly well for software and service businesses with low overhead, but manufacturing or capital-intensive industries may require more traditional planning. Additionally, their “do less” approach might not always scale for businesses with complex regulatory requirements or those serving enterprise clients with extensive needs.</p>\n<h2>\nConclusion</h2>\n<p>\n“Rework” offers a refreshing alternative to conventional business wisdom, advocating for a more thoughtful, balanced, and human approach to work. The book’s central message is that success doesn’t require sixty-hour workweeks, venture capital, or extensive planning -it requires focus on what matters, elimination of what doesn’t, and dedication to quality execution.\\ By challenging assumptions about growth, working hours, planning, and hiring, Fried and Hansson present a blueprint for building businesses that are not only profitable but also sustainable and enjoyable to run. Their philosophy can be distilled to a few key principles: embrace constraints, focus on quality over quantity, prioritise action over planning, and build businesses that respect both customers and employees.\\ Whether you’re running a startup, managing a team, or simply looking to work more effectively, “Rework” provides valuable insights for doing more with less and building something that lasts. It’s not about working more -it’s about working smarter.</p>\n",
"tags": [
"37signals",
"best-practices",
"productivity",
"efficiency",
"company-culture",
"remote-work",
"minimal",
"business-value"
]
},
{
"date": "2025-03-26",
"title": "🌐 Remote: Office Not Required",
"url": "/posts/remote-office-not-required.html",
"content": "<p>\n<strong>TL;DR:</strong> Basecamp founders present a comprehensive guide to remote work, arguing that distributed teams offer increased productivity, access to global talent, and better work-life balance whilst outlining practical strategies for effective communication, maintaining culture, and overcoming common objections to remote collaboration.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<h2>\nWhy Remote Work Makes Sense</h2>\n<h3>\nThe Office Paradox</h3>\n<p>\nOne of the book’s most compelling arguments is that traditional offices often hinder productivity rather than enhance it. Fried and Hansson point out that when people need to get serious work done, they rarely cite the office as their preferred location. Instead, they choose early mornings, late evenings, or weekends- times when interruptions are minimal.\\ Offices have become “interruption factories” where meaningful work is chopped into small, ineffective chunks. Meetings, impromptu desk visits, and constant noise create an environment where deep, focused work becomes nearly impossible. Remote work, by contrast, allows people to create their own distraction-free environments.</p>\n<h3>\nThe End of Commuting</h3>\n<p>\nAnother significant advantage of remote work is eliminating the commute. Beyond the obvious time savings, research shows that long commutes correlate with increased obesity, stress, neck and back pain, and even higher divorce rates. The authors calculate that an average commute consumes 300-400 hours per year- time that could be redirected toward productive work or personal well-being.</p>\n<h3>\nAccess to Global Talent</h3>\n<p>\nPerhaps most importantly, remote work dramatically expands the talent pool. Instead of limiting hiring to a specific geographical area, companies can recruit from anywhere in the world. This not only increases the chances of finding exceptional talent but also naturally leads to a more diverse workforce with varied perspectives.</p>\n<h2>\nMaking Remote Work Work</h2>\n<h3>\nCommunication: The Key to Success</h3>\n<p>\nEffective remote work hinges on communication. The book advocates for a blend of synchronous and asynchronous communication methods:</p>\n<ol>\n <li>\n<strong>Overlap Time</strong>: Ensure team members have at least 4 hours of overlap in </li>\n <li>\n<strong>Screen Sharing</strong>: Use tools like WebEx, GoToMeeting, or Join.me to </li>\n <li>\n<strong>Transparent Documentation</strong>: Make information accessible to everyone </li>\n <li>\n<strong>Virtual Water Cooler</strong>: Create spaces for casual conversation to maintain <br>\ntheir workdays to allow for real-time collaboration when needed. collaborate visually, making it feel more like sitting side-by-side. regardless of time zone, eliminating bottlenecks. Basecamp, their project management tool, was designed specifically with this in mind. company culture and social connections. The authors used Campfire, their web-based chat service, for this purpose. While Campfire was discontinued as a standalone product, it was recently relaunched in 2024 as part of their ONCE line- allowing users to purchase and self-host the software on their own servers rather than subscribing to a SaaS model. </li>\n</ol>\n<h3>\nNavigating Legal and Financial Considerations</h3>\n<p>\nThe book doesn’t shy away from the practical challenges of remote work. In the chapter “Taxes, accounting, laws, oh my!” the authors tackle the nuts and bolts of remote employment:</p>\n<ul>\n <li>\n<strong>Domestic remote work</strong> is relatively straightforward from a legal </li>\n <li>\n<strong>International remote work</strong> presents more challenges. The authors outline </li>\n <li>\n<strong>For remote workers</strong>, they recommend setting up a personal company and <br>\nstandpoint, with few complications beyond potential state tax implications if employees work across state lines. two main approaches: establishing a local office (expensive but comprehensive) or hiring people as contractors (simpler but with limitations on benefits and employment protections). billing as a contractor if working for an international company, though they acknowledge this isn’t a perfect solution. </li>\n</ul>\n<p>\nThe authors are refreshingly honest here, acknowledging that running with a less-than-perfect legal setup is common practice- though they recommend consulting professionals for complex situations.</p>\n<h3>\nOvercoming Common Objections</h3>\n<p>\nFried and Hansson systematically address the objections typically raised against remote work:</p>\n<ul>\n <li>\n<strong>“How do I know people are working?”</strong> If you can’t trust employees to work </li>\n <li>\n<strong>“What about security?”</strong> With proper protocols, remote work can be just as </li>\n <li>\n<strong>“We need face-to-face meetings.”</strong> Most meetings can be conducted </li>\n <li>\n<strong>“We need to maintain our culture.”</strong> Culture stems from values and actions, <br>\nremotely, the issue is hiring, not location. secure as office work. effectively online, and occasional in-person gatherings can satisfy the need for face time. not physical proximity. </li>\n</ul>\n<h3>\nAvoiding Remote Work Pitfalls</h3>\n<p>\nThe book doesn’t gloss over remote work’s challenges:</p>\n<ol>\n <li>\n<strong>Isolation</strong>: Combat loneliness by encouraging employees to work from </li>\n <li>\n<strong>Overwork</strong>: Without clear boundaries, remote workers may struggle to </li>\n <li>\n<strong>Communication Barriers</strong>: When face-to-face interaction is limited, <br>\nco-working spaces or cafés occasionally. disconnect. Managers should focus on results rather than hours worked and look out for signs of burnout. misunderstandings can occur. Clear, thoughtful communication becomes even more crucial. </li>\n</ol>\n<h2>\nBuilding and Managing a Remote Team</h2>\n<h3>\nHiring for Remote Work</h3>\n<p>\nThe authors emphasise that great remote workers possess certain qualities:</p>\n<ul>\n <li>\n<strong>Self-motivation</strong>: They can stay productive without direct supervision. </li>\n <li>\n<strong>Strong writing skills</strong>: Since much of remote communication is written, </li>\n <li>\n<strong>Results-oriented mindset</strong>: They focus on output rather than hours at a <br>\nclear writing is essential. desk. </li>\n</ul>\n<h3>\nCreating Trust and Accountability</h3>\n<p>\nRather than micromanaging, successful remote teams are built on trust. The book recommends:</p>\n<ul>\n <li>\n<strong>Focus on outputs</strong>: Judge work by what’s accomplished, not when or how it’s </li>\n <li>\n<strong>Regular check-ins</strong>: Brief one-on-ones help maintain connection without </li>\n <li>\n<strong>Eliminate roadblocks</strong>: Ensure remote workers have the authority and access <br>\ndone. becoming burdensome. they need to be effective. </li>\n</ul>\n<h3>\nThe Remote Toolbox</h3>\n<p>\nThe authors provide a practical “Remote Toolbox” with specific recommendations:</p>\n<ul>\n <li>\n<strong>Basecamp</strong>: Their own project management tool for organising tasks, </li>\n <li>\n<strong>Video conferencing tools</strong>: Google Hangouts (now Google Meet) for group </li>\n <li>\n<strong>Screen sharing</strong>: WebEx, GoToMeeting, and Join.me for collaboration and </li>\n <li>\n<strong>File sharing</strong>: Dropbox for keeping files synchronised across multiple </li>\n <li>\n<strong>Collaborative documents</strong>: Google Docs for real-time collaboration on text </li>\n <li>\n<strong>Co-working directories</strong>: Resources like Regus, LiquidSpace, Desktime, and <br>\ndiscussions, and files in one central location. video calls with up to 10 people. demonstrations. devices and locations. documents and spreadsheets. the Coworking Wiki to find workspaces while travelling or to escape home office isolation. </li>\n</ul>\n<p>\nMany of these tools have evolved since the book’s publication, but the core functions they serve remain essential to remote work.</p>\n<h3>\nThe Importance of Meetups</h3>\n<p>\nDespite advocating for remote work, the authors stress the value of occasional in-person gatherings. At their company, they met at least twice yearly for 4-5 days[^1]. These meetups strengthen personal bonds, allow for intensive collaboration, and reinforce company culture.</p>\n<h2>\nConclusion</h2>\n<p>\n“Remote: Office Not Required” provides a comprehensive blueprint for implementing successful remote work practices. The authors convincingly argue that remote work offers numerous advantages: increased productivity, access to global talent, better work-life balance, and reduced overhead costs.\\ What makes this book particularly valuable is its grounding in real-world experience. Fried and Hansson have built their business on these principles and have navigated the challenges they describe.\\ As we continue to redefine what work means in the 21st century, the insights from “Remote” remain highly relevant. The authors envision a future where work is judged by results rather than location, where talent knows no geographical boundaries, and where both companies and employees enjoy greater freedom and flexibility.\\ For businesses looking to thrive in this new landscape, “Remote” offers not just philosophy but practical strategies for turning the challenges of distributed work into competitive advantages. The future of work is indeed remote- and this book provides an excellent road map for that journey.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Reasonable adjustments must be considered for those who are not able to</p>\n<pre><code>travel far due to health, family or other reasons beyond their control. It\nis possible to build and maintain a strong culture that does not necessitate\ntravelling, or at least not travelling often or far, if circumstances don't\nallow.</code></pre>\n",
"tags": [
"37signals",
"remote-work",
"advantage",
"company-culture",
"productivity",
"best-practices",
"decision-making",
"onboarding"
]
},
{
"date": "2025-03-23",
"title": "😌 It Doesn't Have to Be Crazy at Work",
"url": "/posts/it-doesnt-have-to-be-crazy-at-work-37-signals.html",
"content": "<p>\n<strong>TL;DR:</strong> Basecamp founders reject the “crazy busy” workplace culture, advocating instead for a “calm company” approach that emphasises reasonable 40-hour workweeks, focused attention, asynchronous communication, and flexible project scope- proving that sustainable work practices can yield successful businesses without sacrificing employee wellbeing.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nIn today’s hyperactive business environment, “crazy busy” has become a badge of honour. Endless workweeks, constant interruptions, and the expectation of instant responses have created workplaces where stress is the norm and burnout is inevitable. But does it have to be this way? <a href=\"https://world.hey.com/jason\">Jason Fried</a> and <a href=\"https://world.hey.com/david\">David Heinemeier Hansson</a>, the founders of <a href=\"https://en.wikipedia.org/w/index.php?title=Basecamp_(company)&redirect=no\">Basecamp</a> (renamed to <a href=\"https://en.wikipedia.org/wiki/37signals\">37signals</a> since 2014), argue emphatically that it doesn’t. In their book “<a href=\"https://basecamp.com/books/calm\">It Doesn’t Have to Be Crazy at Work</a>“ they present a compelling case for a calmer, more sustainable approach to work -one where companies can still be successful without sacrificing the well-being of their employees.\\ The authors, who have built a profitable business with minimal stress and reasonable working hours, dismiss the idea that growth-at-all-costs and around-the-clock work schedules are necessary for success. Instead, they advocate for what they call a “calm company” -an organisation that values sustainable workloads, reasonable expectations, sufficient rest, and focused productivity. Let’s dive into the key principles that can help transform a frantic workplace into a calm and productive environment.</p>\n<h2>\nThe Calm Company Philosophy</h2>\n<h3>\nRethinking Time and Attention</h3>\n<p>\nOne of the core insights from the book is that modern workplaces have become “interruption factories” where meaningful work is nearly impossible. Offices chop the workday into tiny fragments -fifteen minutes here, ten minutes there- with meetings, calls, and constant distractions preventing sustained focus.\\ The authors argue that 8-hour workdays and 40-hour workweeks are plenty of time to accomplish great work, provided that time is actually protected. Instead of measuring commitment by hours spent at a desk, a calm company measures results and respects boundaries. Basecamp’s philosophy is straightforward: “Work 40 hours a week, then stop. No all-nighters, no weekends”.\\ To protect time, the book advocates for asynchronous communication whenever possible. Not everything requires an immediate response. By promoting a culture of eventual response rather than instant reaction, companies give employees the space for deep, focused work. This might mean designating “library rules” in the office -quiet, focused concentration- and setting clear boundaries for when real-time collaboration is truly necessary.</p>\n<h3>\nEliminate Excessive M&Ms: Meetings and Managers</h3>\n<p>\nMeetings and micromanagement are two primary culprits behind workplace chaos. The authors are particularly critical of the modern meeting culture, noting that “a one-hour meeting with ten people isn’t a one-hour meeting -it’s a ten-hour meeting”. Before calling a meeting, they suggest asking whether it’s truly worth pulling multiple people away from their focused work.\\ Similarly, the book challenges managers to stop “managing the chairs” (monitoring when people arrive and leave) and instead focus on managing the work itself. This means setting clear expectations, providing necessary resources, removing obstacles, and then trusting people to execute without constant supervision.\\ At Basecamp, they’ve institutionalised practices like “office hours” for experts, where rather than being constantly available for interruption, they designate specific times when they’re available for questions. They’ve also moved away from real-time chat for important discussions, recognising that this medium often creates an unhealthy expectation of immediate response.</p>\n<h3>\nReasonable Expectations and Focused Scope</h3>\n<p>\nPerhaps the most radical departure from conventional business thinking is the authors’ approach to goals and expectations. They proudly declare: “We don’t do goals at Basecamp”. Instead of chasing arbitrary targets, they focus on doing excellent work consistently and sustainably.\\ The book introduces the concept of “dreadlines” versus deadlines. A dreadline appears when a deadline is paired with an ever-expanding scope. To combat this, Basecamp keeps deadlines fixed but makes scope flexible. Projects can only get smaller over time, not larger, ensuring teams can deliver quality work without burning out.\\ This means being deliberate about what not to do. As the authors put it: “Having less to do isn’t a problem, it’s an advantage”. They suggest developing the skill of “narrowing as you go” -starting projects with exploration, then gradually focusing in on what’s truly important as you approach the deadline.\\ Basecamp also embraces the “disagree and commit” approach to decision-making. Rather than requiring consensus, which can lead to endless debate, someone makes the final call after everyone has been heard -and then the whole team commits to making it work, even if some initially disagreed.</p>\n<h3>\nBuilding a Healthy Remote Work Culture</h3>\n<p>\nRemote work features prominently in Basecamp’s approach to building a calm company. By removing the expectation that everyone must be in the same physical space, they’ve created more flexibility while maintaining productivity.\\ However, they emphasise that remote work requires intentionality. Teams need sufficient overlap in working hours, clear communication practices, and strong writing skills. In fact, the authors consider good writing essential for remote teams, as it eliminates ambiguity and creates a clear record of decisions and rationales.\\ The authors also address the concern that remote work might lead to isolation or disconnection. They recommend regular in-person meetups[^1] and maintaining a strong company culture based on shared values and respect, not forced socialisation or perks designed to keep people at the office longer.</p>\n<h3>\nHiring and Benefits That Support Life Outside Work</h3>\n<p>\nBasecamp’s approach to hiring focuses on finding talented people who value calm productivity over chaotic hustle. Their compensation philosophy is refreshingly straightforward: equal pay for equal work, regardless of location, with no complex negotiation processes.\\ Their benefits are specifically designed to encourage life beyond work. Rather than offering free meals to keep employees in the office longer, they provide benefits that help people disconnect -like paid sabbaticals, summer hours (32-hour workweeks from May through August), and even covering the cost of employees’ vacations. This reinforces their belief that the best workers are well-rested ones with rich lives outside the office.</p>\n<h2>\nConclusion</h2>\n<p>\n“It Doesn’t Have to Be Crazy at Work” presents a refreshing alternative to the burnout culture that pervades much of today’s business world. The calm company model isn’t about doing less or lowering standards -it’s about working smarter, focusing on what truly matters, and creating sustainable conditions where people can do their best work without sacrificing their health and happiness.</p>\n<p>\nThe key takeaways from the book include:</p>\n<ol>\n <li>\nProtect people’s time and attention by eliminating unnecessary interruptions </li>\n <li>\nStick to reasonable work hours (40 hours per week is plenty) </li>\n <li>\nReplace constant meetings with more thoughtful, asynchronous communication </li>\n <li>\nFocus on the quality of work rather than hours logged </li>\n <li>\nKeep deadlines fixed but be flexible about scope </li>\n <li>\nBuild a culture of trust where remote work can thrive </li>\n <li>\nBe intentional about what you say no to </li>\n</ol>\n<p>\nAs the authors suggest, “calm is contagious” -and so is crazy. By choosing calm, companies can create environments where employees thrive, creativity flourishes, and sustainable success becomes possible. The choice, as they say, is yours.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Reasonable adjustments must be considered for those who are not able to</p>\n<pre><code>travel far due to health, family or other reasons beyond their control. It\nis possible to build and maintain a strong culture that does not necessitate\ntravelling, or at least not travelling often or far, if circumstances don't\nallow.</code></pre>\n",
"tags": [
"37signals",
"advantage",
"best-practices",
"decision-making",
"business-value",
"slow-down",
"onboarding",
"remote-work",
"productivity",
"company-culture"
]
},
{
"date": "2025-03-22",
"title": "💡 TIL: A Reactive Python Notebook That Might Replace Jupyter",
"url": "/posts/git-friendly-literate-programming-with-marimo.html",
"content": "<p>\n<strong>TL;DR:</strong> Marimo offers a reactive Python notebook alternative to Jupyter that solves hidden state problems by storing notebooks as pure Python files with deterministic execution based on variable dependencies, making them git-friendly, reproducible, and deployable whilst providing modern features like Vim keybindings and interactive elements.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nAs a long-time Vim/Neovim and IPython user, I’m quite particular about my development environment. So when I say a notebook platform caught my attention enough to consider switching, that’s significant. Recently, I stumbled upon <a href=\"https://marimo.io/\">Marimo</a>, and it might just be the Jupyter alternative I’ve been searching for.</p>\n<h2>\nWhat is Marimo?</h2>\n<p>\nMarimo is a reactive Python notebook environment that solves many long-standing issues with traditional notebooks. Unlike Jupyter, which stores notebooks as JSON with embedded code and outputs, Marimo notebooks are pure Python files that are:</p>\n<ul>\n <li>\n<strong>Reactive</strong>: Run a cell, and Marimo automatically runs dependent cells or </li>\n <li>\n<strong>Consistent</strong>: No hidden state problems that plague traditional notebooks </li>\n <li>\n<strong>Executable</strong>: Can run as standard Python scripts from the command line </li>\n <li>\n<strong>Git-friendly</strong>: Since they’re just <code class=\"inline\">.py</code> files, they work seamlessly with </li>\n <li>\n<strong>Deployable</strong>: Easily share as interactive web apps or slides <br>\nmarks them as stale version control </li>\n</ul>\n<h2>\nWhy This Matters for Literate Programming</h2>\n<p>\nLiterate programming -the approach of writing code as a narrative explanation interleaved with executable components- is incredibly powerful for data science, ML, and AI work. It helps create self-documenting, reproducible research and applications.\\ The problem with Jupyter has always been that while it looks like literate programming, its execution model (arbitrary cell execution order) and hidden state make it fundamentally unreliable. Marimo solves this by ensuring deterministic execution based on variable dependencies rather than cell position.</p>\n<h2>\nKey Features That Won Me Over</h2>\n<ol>\n <li>\n<strong>Vim keybindings</strong>: As a Neovim user, this is non-negotiable </li>\n <li>\n<strong>Modern editor features</strong>: GitHub Copilot integration, AI completion, and </li>\n <li>\n<strong>Reactive runtime</strong>: No more “did I run all the cells in the right order?” </li>\n <li>\n<strong>Interactive elements</strong>: Sliders, tables, and plots that automatically </li>\n <li>\n<strong>SQL integration</strong>: Write SQL against dataframes, databases, or other </li>\n <li>\n<strong>Package management</strong>: Built-in support for dependency tracking and isolated </li>\n <li>\n<strong>Pure Python storage</strong>: No more JSON files with embedded outputs making git <br>\nvariable explorer problems update dependent cells sources right in your notebook environments diffs unreadable </li>\n</ol>\n<h2>\nComparisons with Other Literate Programming Tools</h2>\n<h3>\nPluto.jl (Julia)</h3>\n<p>\nPluto.jl pioneered the reactive notebook concept that Marimo implements. Both share:</p>\n<ul>\n <li>\nAutomatic reactivity based on variable dependencies </li>\n <li>\nDeterministic execution order </li>\n <li>\nInteractive UI elements </li>\n</ul>\n<p>\n<strong>Differences</strong>:</p>\n<ul>\n <li>\nPluto is Julia-specific; Marimo is Python-specific </li>\n <li>\nMarimo stores notebooks as standard <code class=\"inline\">.py</code> files; Pluto uses a custom format </li>\n <li>\nMarimo has more built-in integrations with Python data science libraries </li>\n <li>\nPluto has tighter integration with Julia’s capabilities </li>\n</ul>\n<h3>\nLivebook (Elixir)</h3>\n<p>\nLivebook brings reactive notebooks to the Elixir ecosystem, with:</p>\n<ul>\n <li>\nSmart cells for common tasks </li>\n <li>\nBuilt-in deployment capabilities </li>\n <li>\nCollaborative editing </li>\n</ul>\n<p>\n<strong>Differences</strong>:</p>\n<ul>\n <li>\nLivebook embraces Elixir’s concurrency model; Marimo follows Python’s </li>\n <li>\nMarimo’s Python foundation makes it more accessible for data science work </li>\n <li>\nLivebook has more built-in tools for building distributed systems <br>\nexecution model </li>\n</ul>\n<h2>\nPros and Cons</h2>\n<h3>\nPros</h3>\n<ul>\n <li>\n<strong>Reproducibility</strong>: Deterministic execution eliminates the “run cells in </li>\n <li>\n<strong>Git-friendly</strong>: Pure Python files make version control and collaboration </li>\n <li>\n<strong>No hidden state</strong>: Deleted cell variables are removed from memory </li>\n <li>\n<strong>Deployability</strong>: From notebook to web app with minimal effort </li>\n <li>\n<strong>Testability</strong>: Run standard test suites against your notebooks </li>\n <li>\n<strong>Modern IDE features</strong>: Seems like they’ve thought of everything <br>\nwrong order” problem much easier </li>\n</ul>\n<h3>\nCons</h3>\n<ul>\n <li>\n<strong>Learning curve</strong>: The reactive model requires a shift in thinking if you’re </li>\n <li>\n<strong>Ecosystem maturity</strong>: Newer than Jupyter, so fewer third-party extensions </li>\n <li>\n<strong>Performance considerations</strong>: Automatic reactivity could cause issues with </li>\n <li>\n<strong>Language limitation</strong>: Python-only, unlike Jupyter which supports multiple <br>\nused to Jupyter expensive computations (though there are options to mitigate this[^1]) kernels </li>\n</ul>\n<h2>\nGetting Started</h2>\n<p>\nInstallation is straightforward:</p>\n<pre><code class=\"bash\">uv pip install marimo\n# or with recommended extras\nuv pip install marimo[recommended]\n\n# Try the tutorial\nmarimo tutorial intro</code></pre>\n<h2>\nConclusion</h2>\n<p>\nAs someone deeply invested in both Vim/Neovim and the Python data ecosystem, Marimo strikes an impressive balance. It brings the benefits of reactive programming to Python notebooks while maintaining the flexibility and familiarity that Python users expect.\\ What truly sets Marimo apart is how it addresses the fundamental issues of reproducibility and hidden state that have plagued notebooks for years. By treating notebooks as actual programs with deterministic execution, it enables literate programming in a way that Jupyter always promised but never fully delivered.\\ Is it perfect? No. But it’s the most compelling alternative I’ve seen so far, and I’m seriously considering making the switch for my daily work.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Marimo provides a “lazy” configuration option where cells that would be</p>\n<pre><code>automatically re-executed are instead marked as "stale", allowing users to\nmanually control when expensive computations run. Users can also implement\ncaching strategies using Marimo's built-in caching functionality,\ncompartmentalise heavy computations into separate cells to control their\nexecution flow, or use the @mo.cell decorator with runtime configurations to\ncustomise how specific cells behave when dependencies change.</code></pre>\n",
"tags": [
"til",
"data-science",
"python",
"best-practices",
"reproducibility",
"literate-programming",
"jupyter-alternative",
"reactivity"
]
},
{
"date": "2025-03-22",
"title": "⏪ Making Data Transformations Reversible with fasttransform",
"url": "/posts/fasttransform-for-reversible-data-transformations.html",
"content": "<p>\n<strong>TL;DR:</strong> Fast.ai’s fasttransform library makes machine learning data pipelines reversible by pairing each transformation with its inverse, enabling visualisation of transformed data for debugging and utilising multiple dispatch to handle different data types intelligently- crucial for understanding model behaviour and identifying spurious correlations.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nMachine learning practitioners face a common problem: after applying multiple transformations to prepare data for training, it becomes difficult to visualise what the model actually sees. This visualisation gap makes debugging challenging and often leads to missing critical insights about model behaviour.\\ For example, consider a model built to distinguish wolves from huskies that performs poorly on certain images. Without the ability to easily inspect how transformations affect the input data, one might miss that the model is actually detecting snow (common in wolf photos) rather than the animals themselves.\\ Fast.ai’s solution to this problem is <a href=\"https://github.com/AnswerDotAI/fasttransform\">fasttransform</a>[^1], a library that ensures any transformation applied to data can be easily reversed. Let’s explore how it works and why it matters.</p>\n<h2>\nReversible Pipelines Made Simple</h2>\n<h3>\nThe Problem with One-Way Transforms</h3>\n<p>\nTraditional data transformation pipelines in libraries like PyTorch are one-way streets. Consider this simple example of normalising an image:</p>\n<pre><code class=\"python\">from torchvision import transforms as T\ntransforms_pt = T.Compose([\n T.Resize(256),\n T.CenterCrop(224),\n T.ToTensor(),\n T.Normalize(*imagenet_stats)\n])\n\nimg = Image.open("husky.jpeg")\nimg_transformed = transforms_pt(img)</code></pre>\n<p>\nAttempting to visualise <code class=\"inline\">img_transformed</code> results in a mess of pixel values outside the displayable range. To see what the model sees, one needs to manually write an inverse transform function:</p>\n<pre><code class=\"python\">def decode_pt(tensor, mean, std):\n out = tensor.clone()\n for t, m, s in zip(out, mean, std): t.mul_(s).add_(m)\n out = out.mul(255).clamp(0, 255).byte()\n return out</code></pre>\n<p>\nThis is tedious and error-prone, especially as your transformation pipeline grows more complex.</p>\n<h3>\nAn Elegant Solution</h3>\n<p>\nfasttransform takes a fundamentally different approach by pairing each transformation with its inverse. Here’s the same pipeline using fasttransform:</p>\n<pre><code class=\"python\">from fastai.vision.all import *\n\ntransforms_ft = Pipeline([\n PILImage.create,\n Resize(256, method="squish"),\n Resize(224, method="crop"),\n ToTensor(),\n IntToFloatTensor(),\n Normalize.from_stats(*imagenet_stats)\n])\n\n# Transform our image\nfpath = Path("./huskies_vs_wolves/train/husky/husky_0.jpeg")\nimg_transformed = transforms_ft(fpath)\n# To reverse the transformations:\nimg_decoded = transforms_ft.decode(img_transformed)</code></pre>\n<p>\nThe magic lies in how each transform defines both forward and reverse operations:</p>\n<pre><code class=\"python\">class Normalize(Transform):\n def __init__(self, mean=None, std=None):\n self.mean = mean\n self.std = std\n\n def encodes(self, x): return (x-self.mean) / self.std # forward transform\n def decodes(self, x): return x*self.std + self.mean # inverse transform</code></pre>\n<p>\nBy defining both <code class=\"inline\">encodes</code> and <code class=\"inline\">decodes</code> methods, fasttransform automatically knows how to reverse your transformations. This is particularly valuable when working with fast.ai v2, where this kind of visualisation capability is built directly into core functions like <code class=\"inline\">show_batch</code> and <code class=\"inline\">show_results</code>.</p>\n<h3>\nMultiple Dispatch: The Secret Sauce</h3>\n<p>\nAnother powerful feature of fasttransform is how it handles different types of data. Using a concept called <a href=\"https://www.youtube.com/watch?v=kc9HwsxE1OY\">multiple dispatch</a>[^2], transformations can apply differently based on the type of data they receive.</p>\n<p>\nThis becomes particularly valuable when dealing with images and their labels, allowing a single pipeline to handle both:</p>\n<pre><code class=\"python\"># Function that loads both image and its label\ndef load_img_and_label(fp): return PILImage.create(fp), parent_label(fp)\n\ntransforms_ft = Pipeline([\n load_img_and_label, # Loads both image and label as a tuple\n Resize(256, method="squish"),\n Resize(224, method="crop"),\n ToTensor(),\n IntToFloatTensor(),\n Normalize.from_stats(*imagenet_stats)\n])</code></pre>\n<p>\nThe pipeline intelligently applies each transform only to the appropriate data types, eliminating the need for separate transformation pipelines.</p>\n<h3>\nConnections to Julia’s Multiple Dispatch</h3>\n<p>\nInterestingly, the concept of multiple dispatch that fasttransform leverages is a core feature of the Julia programming language. In Julia, which method of a function gets called depends on the types of all arguments, not just the first one (as in traditional object-oriented programming).\\ As explained in Julia’s documentation: “<em>Using all of a function’s arguments to choose which method should be invoked, rather than just the first, is known as multiple dispatch. Multiple dispatch is particularly useful for mathematical code, where it makes little sense to artificially deem the operations to ‘belong’ to one argument more than any of the others</em>“.\\ The connection to Julia is particularly illuminating, as it demonstrates how concepts from one language can inspire powerful design patterns in another. Just as Julia’s multiple dispatch enables elegant mathematical code, fasttransform’s implementation of this concept allows for cleaner, more intuitive data pipelines in Python.</p>\n<h2>\nConclusion</h2>\n<p>\nfasttransform represents a significant step forward in making machine learning workflows more intuitive and debugging more accessible. By making transformations reversible through paired encode/decode methods and leveraging multiple dispatch to handle different data types intelligently, it solves two fundamental problems in data processing pipelines: the inability to easily reverse transformations to inspect data, and the need for separate transformation pipelines for different types of data.\\ The ability to easily visualise transformed data isn’t just convenient -it’s essential for understanding model behaviour and catching issues like the wolf/husky example, where models learn spurious correlations rather than intended features.\\ As machine learning systems grow more complex, tools like fasttransform that improve transparency and the ability to debug become increasingly valuable. Whether working with images, text, time series, or other data types, being able to see what a model sees provides critical insights that might otherwise be missed.\\ Returning to our wolf/husky example, the ability to easily visualise transformed data allows researchers to immediately identify that their model is learning to detect snow backgrounds rather than animal features -a crucial insight for building more robust models.\\ Those interested in trying fasttransform can install it with <code class=\"inline\">pip install fasttransform</code> and check out the <a href=\"https://github.com/AnswerDotAI/fasttransform\">official fasttransform documentation</a> for more examples and detailed API references. The library offers these capabilities with minimal performance overhead, as the paired transformation approach adds negligible computational cost while providing significant benefits for debugging and understanding model behaviour.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Rens Dimmendaal, Hamel Husain, & Jeremy Howard.</p>\n<pre><code>"[fasttransform: Reversible Pipelines Made Simple](https://www.fast.ai/posts/2025-02-20-fasttransform.html)"\nfast.ai blog, February 20, 2025.</code></pre>\n<p>\n[^2]: “<a href=\"https://docs.julialang.org/en/v1/manual/methods/\">Methods · The Julia Language</a>“</p>\n<pre><code>Julia Documentation, docs.julialang.org.</code></pre>\n",
"tags": [
"machine-learning",
"data-processing",
"fast-ai",
"python",
"data-science",
"optimisation",
"best-practices",
"interpretability"
]
},
{
"date": "2025-03-20",
"title": "🏠 Why Companies and Individuals Are Moving Back from the Cloud",
"url": "/posts/cloud-repatriation-trends-implications.html",
"content": "<p>\n<strong>TL;DR:</strong> Cloud repatriation is gaining momentum with 86% of CIOs planning to move some workloads back on-premises, driven by unexpected costs, performance needs, security concerns, and desire for greater control- though most organisations are adopting hybrid approaches rather than abandoning cloud entirely, strategically placing workloads where they function most efficiently.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThe last decade has witnessed the meteoric rise of cloud computing, with organisations large and small migrating their data, applications, and infrastructure to public cloud environments. The promises were compelling: reduced capital expenditure, unlimited scalability, enhanced flexibility, and access to cutting-edge technologies without the overhead of managing physical infrastructure. However, a notable countertrend has emerged in recent years -cloud repatriation. This phenomenon, sometimes referred to as “reverse cloud migration,” involves moving workloads, applications, and data back from public cloud environments to on-premises data centres, private clouds, or hybrid setups (<a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International, 2023</a>). I’ve previously explored this topic in my article <a href=\"{{ site.baseurl }}{% link _posts/2024-11-14-cloud-repatriation.md %}\">The On-Prem Comeback (aka Cloud Repatriation)</a>, where I introduced the basic concepts and early examples of this trend.\\ This article explores the growing cloud repatriation movement, examining why organisations and individuals are reconsidering their cloud-first strategies, the key drivers behind these decisions, and how they’re implementing these transitions to achieve more balanced and optimised IT infrastructures.</p>\n<h2>\nThe Scale of the Cloud Repatriation Movement</h2>\n<p>\nThe repatriation trend is not isolated but represents a significant shift in how organisations approach their IT infrastructure strategy. According to a 2021 survey by IDC cited by <a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International</a>, 80% of organisations reported repatriating workloads or data from public cloud environments. More recent data from the end of 2024 showed that 86% of CIOs planned to move some public cloud workloads back to private cloud or on-premises -the highest on record for the Barclays CIO Survey (<a href=\"https://www.puppet.com/blog/cloud-repatriation\">Puppet, 2025</a>).\\ A recent survey by Rackspace found that nearly seven in 10 companies (69%) have moved at least some applications off the cloud and back to on-premise systems or private clouds (<a href=\"https://www.zdnet.com/article/why-some-companies-are-backing-away-from-the-public-cloud/\">ZDNet, 2025</a>).\\ It’s important to note that this doesn’t represent a wholesale abandonment of cloud computing. Only about 8% of organisations are moving their entire workloads off the cloud, according to an October 2024 IDC survey (<a href=\"https://www.puppet.com/blog/cloud-repatriation\">Puppet, 2025</a>). Most are selectively repatriating specific workloads while maintaining others in the cloud, resulting in more nuanced, hybrid approaches to IT infrastructure.</p>\n<h2>\nKey Drivers of Cloud Repatriation</h2>\n<h3>\nCost Optimisation</h3>\n<p>\nWhile the cloud initially promised cost savings through reduced capital expenditure and operational flexibility, many organisations have experienced what industry experts call “bill shock” as their cloud usage scales. According to <a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International</a>, “a Gartner study predicts that through 2024, 60% of infrastructure and operations leaders will encounter public cloud cost overruns that negatively impact their on-premises budgets”.\\ This cost concern is particularly relevant for organisations with predictable, high-volume workloads. According to <a href=\"https://www.rsa.com/resources/blog/identity-governance-and-administration/cloud-repatriation-why-enterprise-it-is-returning-from-the-cloud/\">RSA</a>, the company 37Signals announced that its “cloud exit” would save more than $10 million over five years. Similarly, a 2022 report by Andreessen Horowitz found that repatriation of cloud workloads could reduce cloud bills by 50% or more for some companies (<a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International, 2023</a>).\\ David Linthicum, a leading consultant and former CTO with Deloitte, attributes much of this cost issue to technical debt: “<em>They didn’t refactor the applications to make them more efficient in running on the public cloud providers. So the public cloud providers, much like if we’re pulling too much electricity off the grid, just hit them with huge bills to support the computational and storage needs of those under-optimized applications</em>“ (<a href=\"https://www.zdnet.com/article/why-some-companies-are-backing-away-from-the-public-cloud/\">ZDNet, 2025</a>).</p>\n<h3>\nPerformance and Latency</h3>\n<p>\nPerformance requirements are driving many repatriation decisions, particularly for applications requiring ultra-low latency. According to <a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International</a>, “a study by the IEEE found that for certain AI workloads, on-premises GPU clusters outperformed cloud-based solutions by up to 30% in terms of performance per dollar”.\\ This performance concern is especially critical in fields like financial trading, scientific research, and manufacturing where latency can significantly impact outcomes. As <a href=\"https://www.computerweekly.com/feature/Cloud-repatriation-How-to-do-it-successfully\">ComputerWeekly</a> notes, “time-sensitive data includes information that users need to access as rapidly as possible -think financial trading feeds -or where the application is sensitive to latency”.</p>\n<h3>\nSecurity and Compliance</h3>\n<p>\nSecurity concerns and regulatory compliance requirements are powerful motivators for cloud repatriation. According to the Rackspace survey cited by <a href=\"https://www.zdnet.com/article/why-some-companies-are-backing-away-from-the-public-cloud/\">ZDNet</a>, data security and compliance concerns were the most common reason for repatriation, cited by 50% of respondents.\\ The implementation of stringent regulations like GDPR has compelled many organisations to keep certain data within specific geographic boundaries. As <a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International</a> highlights, “<em>The Data Protection Commission reported a 59% increase in GDPR complaints in 2022, underscoring the importance of data sovereignty</em>“.\\ Despite cloud providers’ significant security investments, many organisations prefer to maintain direct control over their most sensitive data. According to <a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International</a>, “<em>a 2023 Thales Cloud Security Study found that 45% of businesses have experienced a cloud-based data breach or failed audit in the past 12 months, highlighting ongoing security concerns</em>“.</p>\n<h3>\nControl and Vendor Lock-in</h3>\n<p>\nThe desire for greater control over hardware and software configurations, along with concerns about vendor lock-in, are also driving repatriation efforts. On-premises infrastructure offers more customisation possibilities that may not be available in public cloud environments.\\ Richard Robbins, founder of TheTechnologyVault.com, observes that “<em>enterprises don’t like being dependent upon someone else’s cloud infrastructure</em>“ (<a href=\"https://www.zdnet.com/article/why-some-companies-are-backing-away-from-the-public-cloud/\">ZDNet, 2025</a>). This concern is particularly acute among regulated industries such as financial institutions, which are “<em>moving some or all of their web apps from the cloud back to on-prem or to hybrid setups</em>“ due to “vulnerability and downsides to cloud hosting” that make “executives feel nervous about not having more control”.</p>\n<h2>\nThe Emergence of Balanced Approaches</h2>\n<p>\nRather than a binary choice between cloud and on-premises, organisations are increasingly adopting hybrid and multi-cloud approaches that offer the best of both worlds. This trend allows organisations to:</p>\n<ul>\n <li>\nKeep sensitive or high-performance workloads on-premises </li>\n <li>\nLeverage cloud services for scalability and innovation </li>\n <li>\nMaintain flexibility to adapt to changing business needs </li>\n</ul>\n<p>\nAccording to <a href=\"https://blog.trginternational.com/cloud-repatriation-business-return-on-premises\">TRG International</a>, “<em>The hybrid cloud market is expected to grow from $85.3 billion in 2022 to $262.4 billion by 2027, according to MarketsandMarkets research</em>“. Similarly, “<em>Flexera’s 2023 State of the Cloud Report revealed that 71% of enterprises are pursuing a hybrid cloud strategy, combining public cloud, private cloud, and on-premises infrastructure</em>“.</p>\n<h2>\nPersonal Cloud Repatriation</h2>\n<p>\nThe repatriation trend isn’t limited to enterprises. Individuals are also exploring self-hosting options for personal data.\\ For example, <a href=\"https://hachyderm.io/@Jeffrey04/114175854454606516\">a fediverse user</a> recently posted about developing a self-hosted photo album application when faced with cloud storage limitations: “<em>Being an enthusiastic photographer, my partner captured moments of us together. However, the increasing stack of photos is accelerating the imminent explosion of my cloud storage</em>“ (<a href=\"https://kitfucoda.medium.com/a-love-story-in-code-building-my-self-hosted-photo-album-b56a4e89ebdd\">KitFu Coda, 2023</a>). This personal project highlights how individuals with technical skills can leverage idle hardware to create cost-effective alternatives to cloud storage services.\\ As <a href=\"https://kitfucoda.medium.com/a-love-story-in-code-building-my-self-hosted-photo-album-b56a4e89ebdd\">they note</a>, “<em>Self-hosting your own data is becoming a trend these days, and it is really not hard to get started</em>“. This trend parallels the enterprise movement, with individuals seeking greater control, cost savings, and privacy for their personal data.</p>\n<h2>\nPlanning for Successful Repatriation</h2>\n<p>\nFor organisations considering cloud repatriation, careful planning is essential. Key considerations include:</p>\n<ol>\n <li>\n<strong>Workload Assessment</strong>: Not all workloads benefit equally from repatriation. </li>\n <li>\n<strong>Infrastructure Preparation</strong>: Organisations must ensure they have the </li>\n <li>\n<strong>Skills Assessment</strong>: </li>\n <li>\n<strong>Future-Proofing</strong>: <br>\n<a href=\"https://www.computerweekly.com/feature/Cloud-repatriation-How-to-do-it-successfully\">ComputerWeekly</a> advises that “<em>broadly, repatriation might be the best option where data is sensitive, time sensitive or expensive to store in the cloud</em>“. physical capacity, networking, power, and cooling capabilities to support repatriated workloads. According to <a href=\"https://www.computerweekly.com/feature/Cloud-repatriation-How-to-do-it-successfully\">ComputerWeekly</a>, “<em>a large repatriation project might be a prompt to reorganise the datacentre, perhaps by moving to newer equipment that can pack more storage into a single rack or that consumes less power</em>“. <a href=\"https://www.computerweekly.com/feature/Cloud-repatriation-How-to-do-it-successfully\">ComputerWeekly</a> notes the importance of having “<em>enough staff to provision and manage a larger system</em>“ with the necessary “<em>security and privacy skills needed to handle sensitive data</em>“ and “<em>technical know-how to handle mission-critical, latency sensitive applications</em>“. <a href=\"https://www.rsa.com/resources/blog/identity-governance-and-administration/cloud-repatriation-why-enterprise-it-is-returning-from-the-cloud/\">RSA</a> emphasises the importance of maintaining flexibility: “<em>Organizations should consider the long-term implications of repatriation for their overall IT strategy. This includes planning for future scalability, considering how repatriation fits into the broader digital transformation initiatives, and ensuring that the new infrastructure aligns with long-term business goals</em>“. </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nCloud repatriation represents a maturing perspective on IT infrastructure strategy rather than a rejection of cloud computing. As organisations gain experience with cloud environments, they’re becoming more strategic about which workloads belong where, based on factors like cost, performance, security, and control.\\ The future likely belongs to balanced, hybrid approaches that leverage the strengths of both cloud and on-premises infrastructure. As <a href=\"https://www.puppet.com/blog/cloud-repatriation\">Puppet</a> notes, “<em>Cloud repatriation is not an endpoint, but rather a strategic tool in the ongoing evolution of enterprise IT. It empowers organizations to take control of their digital assets, enhance their security posture, and align their technology infrastructure with their business objectives</em>“.\\ For both organisations and individuals, the key is making informed decisions about where and how to deploy IT resources based on specific needs rather than following blanket “cloud-first” or “on-premises-first” policies. This nuanced approach to infrastructure strategy will likely characterise the next phase of digital transformation as the industry moves beyond the initial hype cycles of cloud adoption.</p>\n",
"tags": [
"cloud",
"on-prem",
"performance",
"security",
"mlops",
"deployment",
"best-practices",
"data-science"
]
},
{
"date": "2025-03-20",
"title": "⚙️ Turning Data Science into Real-World Value with The Drivetrain Framework",
"url": "/posts/the-drivetrain-method.html",
"content": "<p>\n<strong>TL;DR:</strong> Jeremy Howard’s Drivetrain Framework transforms data science from isolated predictions to value-creating systems through four steps: defining clear objectives, identifying controllable levers, collecting causal data through deliberate experimentation, and building integrated systems that combine predictive models with optimisation- bridging the gap between analytics and measurable business results.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nMost data science initiatives fail to deliver meaningful impact. Why? Because they focus on prediction rather than action. Organisations spend millions building sophisticated prediction models that tell them what <em>might</em> happen, but provide no clear path to influencing outcomes.\\ This gap between prediction and value creation is what Jeremy Howard, data scientist and entrepreneur, addressed in his transformative “<a href=\"https://www.youtube.com/watch?v=vYrWTDxoeGg\">Drivetrain Framework</a>“ back in 2012. Having successfully applied this approach to revolutionise insurance pricing, Howard outlines a systematic method for connecting data science to tangible business results.\\ The framework isn’t about building more complex algorithms -it’s about constructing systems that link predictions to decisions that drive value. If you’re struggling to translate advanced analytics into bottom-line results or finding your data science investments yield interesting insights but limited action, this framework offers a practical solution to bridge that gap.</p>\n<h2>\nThe Four Critical Components</h2>\n<p>\nThe Drivetrain Framework consists of four interconnected steps that bridge the gap between data and value:</p>\n<h3>\n1. Define Your Objective</h3>\n<p>\nBegin with absolute clarity about what you’re trying to achieve. In Howard’s insurance example, the objective was straightforward: maximise profit from each customer based on price. For Google’s search engine, it was finding the most relevant web page based on a query. For a marketing team, it might be maximising customer lifetime value.\\ Without a clear objective, data science becomes an academic exercise. With one, it becomes a targeted tool for value creation.</p>\n<h3>\n2. Identify Your Levers</h3>\n<p>\nNext, determine what variables you can actually control. These are your “levers” -the actions you can take to influence outcomes:</p>\n<ul>\n <li>\nFor Google, the key lever was the ordering of search results </li>\n <li>\nFor insurers, it was the price offered to each customer </li>\n <li>\nFor marketers, levers include product recommendations, discount offers, and <br>\ncommunication timing </li>\n</ul>\n<p>\nThe insight here is focusing not on what you can predict, but on what you can change.</p>\n<h3>\n3. Collect Causal Data</h3>\n<p>\nHoward emphasises a crucial distinction: most organisations have plenty of observational data showing correlations, but lack causal data showing what happens when you pull different levers.\\ This requires intentional experimentation:</p>\n<ul>\n <li>\nThe insurance company randomly varied prices to understand true price-response </li>\n <li>\nA marketing team might randomly test diverse recommendations rather than <br>\nrelationships showing what customers already like </li>\n</ul>\n<p>\nThe counterintuitive insight: You must sometimes sacrifice short-term optimisation to collect data that enables superior long-term results. Howard convinced insurance executives to randomise pricing for six months -initially accepting potentially lower profits -to build models that later significantly increased their profitability and transformed how the entire industry approached pricing.</p>\n<h3>\n4. Build an Integrated System</h3>\n<p>\nThe final step combines three elements to connect levers to objectives:</p>\n<ul>\n <li>\n<strong>Modeller</strong>: Build predictive models for key relationships (e.g. how price </li>\n <li>\n<strong>Simulator</strong>: Combine models to predict outcomes of actions (e.g. how price </li>\n <li>\n<strong>Optimizer</strong>: Find the best lever settings to achieve objectives (e.g. <br>\naffects purchase probability) changes affect profit across customer segments) optimal price for each customer) </li>\n</ul>\n<p>\nThis integrated approach replaces the need for complex “PageRank-like” algorithms with systems that combine simpler models to optimise real-world outcomes.</p>\n<h2>\nApplication: Revolutionising Marketing</h2>\n<p>\nHoward suggests marketing analytics remains in the “Dark Ages” and ready for transformation through the Drivetrain approach:\\ Consider Amazon’s recommendation system. Rather than simply suggesting more books by authors you’ve already read, a Drivetrain-based system would:</p>\n<ol>\n <li>\nDefine the objective as maximising customer lifetime value </li>\n <li>\nIdentify recommendation content as a key lever </li>\n <li>\nCollect causal data by testing diverse recommendations, including unexpected </li>\n <li>\nBuild an integrated system that models what customers might enjoy but don’t <br>\nones yet know about, and optimises for long-term value </li>\n</ol>\n<p>\nIn Howard’s experience, companies implementing this approach have seen substantial improvements in customer engagement and retention while achieving meaningful reductions in marketing costs.</p>\n<h2>\nDrawing from Engineering</h2>\n<p>\nHoward notes that many solutions already exist in engineering disciplines, which data scientists would benefit from studying.\\ Aircraft designers have used integrated models and optimisation for decades, combining aerodynamic models, structural analysis, and optimisation techniques to create planes that safely fly millions of passengers daily.\\ Building construction similarly relies on systems that integrate architectural models, structural engineering, and materials science to optimize for safety, cost, and aesthetics.\\ The most advanced example might be Google’s self-driving car, which integrates multiple predictive models (how the car responds to controls, what sensors detect) with optimisation to safely navigate real-world environments, significantly improving safety in testing environments.\\ These engineering successes demonstrate how combining relatively simple models into integrated systems can solve extraordinarily complex problems.</p>\n<h2>\nConclusion</h2>\n<p>\nThe Drivetrain Framework represents a fundamental shift in how we should approach data science:</p>\n<ol>\n <li>\nMove beyond building better predictive models in isolation </li>\n <li>\nFocus on connecting predictions to actions that drive real value </li>\n <li>\nInvest in collecting causal data through deliberate experimentation </li>\n <li>\nIntegrate modelling, simulation, and optimisation into coherent systems </li>\n</ol>\n<p>\nBy adopting this framework, organisations can bridge the gap between sophisticated analytics and meaningful results. The companies that will gain competitive advantage aren’t those with marginally better algorithms, but those that build integrated systems connecting data to decisions that create value.</p>\n<h2>\nGetting Started</h2>\n<p>\nTo begin implementing the Drivetrain approach:</p>\n<ol>\n <li>\nIdentify one high-value business objective with measurable outcomes </li>\n <li>\nMap the specific levers your team can control that influence this objective </li>\n <li>\nDesign small-scale experiments to collect causal data about these </li>\n <li>\nStart simple -build basic models for key relationships, then integrate them <br>\nrelationships before attempting sophisticated optimisation </li>\n</ol>\n<p>\nThe most important step is shifting your thinking from “<em>what can we predict?</em>“ to “<em>what actions can we take to create value?</em>“ -the essence of the Drivetrain Framework.</p>\n",
"tags": [
"data-science",
"decision-making",
"machine-learning",
"modelling-mindsets",
"optimisation",
"fast-ai",
"advantage",
"best-practices",
"design-principles",
"causal-inference",
"business-value",
"predictive-modelling",
"integration",
"deliberate-experimentation",
"real-value"
]
},
{
"date": "2025-03-19",
"title": "📦 From Compilation to Containerisation and Back Again",
"url": "/posts/compilation-going-back-full-circle.html",
"content": "<p>\n<strong>TL;DR:</strong> Programming languages have evolved from compiled executables to interpreted languages and containerisation, but Deno 2.0 brings deployment full circle by enabling TypeScript/JavaScript compilation into standalone binaries-offering simplified cross-platform deployment whilst maintaining ecosystem richness and enabling single-language development across entire application stacks.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nOver the years, I’ve experimented with numerous programming languages and deployment strategies. Python has been my domain’s lingua franca -with its vast ecosystem for data science and AI applications. However, its deployment complexities have consistently been a pain point: managing dependencies, configuring containers, and setting up build pipelines.\\ This search for a better alternative has led me through statically compiled languages like Go and Rust; JIT-compiled languages like Julia; and hosted languages like Clojure and Scala. Yet most failed to provide a good balance between ecosystem richness and deployment simplicity. Recently, however, Deno <br>\n2.0 has emerged as a compelling solution -particularly with its ability to <br>\ncompile TypeScript (TS) / JavaScript (JS) to standalone executables.</p>\n<h2>\nThe Circular Evolution of Programming Languages</h2>\n<p>\nProgramming languages have undergone a fascinating evolution. In the beginning (the late 1950s and 1960s), languages like Fortran, COBOL, and C were ahead-of-time compiled -transformed directly into machine code executables that could run without additional dependencies.\\ As computing evolved, the pendulum swung toward higher-level languages -interpreted languages like Python and hosted environments like the JVM- prioritising readability and developer productivity over raw performance. These languages abstracted away machine-level concerns, allowing developers to focus on solving business problems.\\ Yet this shift introduced new challenges. Python applications often require managing complex dependency trees, virtual environments, and platform-specific configurations. The infamous “<em>works on my machine</em>“ problem became so pervasive that containerisation emerged as a solution.\\ While effective, containerisation introduces its own complexities: orchestration, image management, and networking configurations. What began as a solution to simplify deployment has become a complex system requiring specialised knowledge.</p>\n<h2>\nDeno: Compilation Makes a Comeback</h2>\n<p>\nDeno 2.0 represents a return to first principles. As highlighted in the <a href=\"https://youtube.com/watch?v=ZsDqTQs3_G0\">Run JavaScript Anywhere</a> video, its <code class=\"inline\">compile</code> command enables developers to transform JS and TS programs into standalone binaries that run across major platforms -no runtime installation or dependencies required.</p>\n<pre><code class=\"typescript\">// sample.ts\nimport { open } from "https://deno.land/x/open/index.ts";\n\n// Open a URL in the default browser\nawait open("https://example.com");</code></pre>\n<p>\nWith a simple <code class=\"inline\">deno compile sample.ts</code> command, this code becomes a standalone executable that works on any machine without requiring Deno to be installed.\\ This compilation process isn’t traditional transpilation to machine code -it embeds your JS and TS code into a specialized Deno runtime binary (denort). Your script and dependencies are bundled as an EZIP file and injected into the runtime binary, creating a self-contained executable that can be code-signed for distribution.</p>\n<p>\nThe key benefits include:</p>\n<ol>\n <li>\n<strong>Cross-platform compatibility</strong> without runtime requirements </li>\n <li>\n<strong>Simplified deployment</strong> with single-binary distribution </li>\n <li>\n<strong>Bundled assets</strong> for complete portability </li>\n <li>\n<strong>Improved startup times</strong> compared to interpreter-based approaches </li>\n</ol>\n<p>\nDeno 2.0 enhances these capabilities further with support for npm packages, web workers, cross-compilation, smaller binary sizes, and code signing with custom icons- making it viable for complete applications, not just scripts.</p>\n<h2>\nThe Single Language Advantage</h2>\n<p>\nBeyond deployment simplicity, using a single language across an entire project stack creates significant organisational benefits. I’ve experienced first-hand how using different languages for front-end, back-end, and data science work can create silos within teams.\\ <a href=\"https://dockyard.com/blog/2024/02/06/5-benefts-amplified-saw-switching-to-elixir\">Amplified’s case study</a> demonstrates this point clearly. After switching from a React/JS front-end and Phoenix/Elixir back-end to an all-Elixir approach with LiveView, they reported:</p>\n<ol>\n <li>\n<strong>Halved server costs</strong> through more efficient resource utilisation </li>\n <li>\n<strong>Dramatically increased development speed</strong> by eliminating cross-language </li>\n <li>\n<strong>Improved team cohesion</strong> with shared tooling and knowledge </li>\n <li>\n<strong>Enhanced maintainability</strong> through code reuse </li>\n <li>\n<strong>Reduced team size requirements</strong> from 12 developers to just 2 <br>\nsilos </li>\n</ol>\n<p>\nTS with Deno provides a similar single language opportunity -allowing teams to build front-end interfaces, back-end services, and data processing workflows with the same toolchain. The JS/TS ecosystem is rapidly maturing for AI, ML, and data science applications, as I noted in my previous article on <a href=\"{{ site.baseurl }}{% link _posts/2024-09-05-deno.md %}\">Modern Data Science and AI Engineering with Deno 2.0</a>.\\ One often overlooked benefit is the reduced cognitive load when developers don’t need to context-switch between different language paradigms, package managers, testing frameworks, and debugging approaches.</p>\n<h2>\nPractical Applications</h2>\n<p>\nDeno’s compilation capabilities shine in several real-world scenarios:</p>\n<ol>\n <li>\n<strong>CLI Tools</strong>: Creating self-contained executables that “just work” across </li>\n <li>\n<strong>Offline Environments</strong>: Deploying to systems without internet access, where </li>\n <li>\n<strong>Cross-Platform Applications</strong>: Building desktop applications that leverage <br>\nplatforms without complex installation instructions package resolution at runtime isn’t possible web technologies without requiring a browser runtime </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nWe’ve come full circle in programming language evolution -from compiled languages like Fortran in the 1950s, to interpreted languages for improved developer experience, to containerisation for managing deployment complexities, and now back to compilation with Deno[^1].\\ Deno’s approach represents a compelling blend- combining deployment simplicity with the ecosystem richness of modern TS/JS. For AI engineering, this addresses many pain points of Python deployment while maintaining access to growing ecosystem of data science tools.\\ While Elixir offers similar single language benefits, its distribution story remains a work in progress with projects like <a href=\"https://github.com/burrito-elixir/burrito\">Burrito</a> showing promise but not yet fully mature. Until then, Deno stands out as a viable alternative for simplified deployment without sacrificing ecosystem benefits.\\ The future of deployment may look surprisingly like its past, just with better languages and tools at our disposal -offering a path toward more cohesive, efficient software development that reduces complexity without sacrificing capability.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Go, Zig, Rust, C/C++ D, Nim, Common Lisp are some prominent examples of</p>\n<pre><code>ahead-of-time compiled languages that -with the exception of Common Lisp-\nexcel in systems programming. However, Deno allows a ubiquitous,\nhigher-level language like JS and its superset TS to join the club of\nlanguages that can easily package code to a cross-platform single binary.</code></pre>\n",
"tags": [
"deno",
"typescript",
"deployment",
"cross-platform",
"evolution",
"toolchain",
"best-practices",
"code-quality"
]
},
{
"date": "2025-03-18",
"title": "🧠 RAG vs CAG: Understanding Knowledge Augmentation in LLMs",
"url": "/posts/rag-or-cag.html",
"content": "<p>\n<strong>TL;DR:</strong> Retrieval Augmented Generation (RAG) and Cache Augmented Generation (CAG) represent two distinct approaches to expanding LLM knowledge: RAG dynamically retrieves relevant documents for each query, offering scalability for large datasets, whilst CAG preloads all information into the model’s context window, providing faster responses for smaller, static knowledge bases.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nLarge Language Models (LLMs) face a fundamental knowledge problem: they’re limited to information present in their training data. This creates challenges when dealing with recent events that occurred after training or proprietary information specific to an organization.\\ To address these limitations, two primary augmentation techniques have emerged: Retrieval Augmented Generation (RAG) and Cache Augmented Generation (CAG). This article breaks down both approaches based on <a href=\"https://www.youtube.com/channel/UCKWaEZ-_VweaEx1j62do_vQ\">IBM Technology</a>‘s comprehensive explanation from their <a href=\"https://youtube.com/watch?v=HdafI0t3sEY\">video on RAG vs CAG</a>, examining how they work, their capabilities, and when to use each one.</p>\n<h2>\nUnderstanding RAG and CAG</h2>\n<h3>\nRetrieval Augmented Generation (RAG)</h3>\n<p>\nRAG operates through a two-phase system:</p>\n<ol>\n <li>\n<strong>Offline Phase (Preparation)</strong> </li>\n <li>\n<strong>Online Phase (Query & Response)</strong> <ul>\n <li>\nDocuments are broken into manageable chunks. - Vector embeddings are created for each chunk using an embedding model. - These embeddings are stored in a vector database, creating a searchable knowledge index. - The user submits a query. - The RAG retriever converts this query to a vector using the same embedding model. - The system performs a similarity search in the vector database. - It retrieves the most relevant document chunks (typically 3-5 passages). - These chunks and the user’s query are placed in the LLM’s context window. - The LLM generates an answer based on both the query and the retrieved context. </li>\n </ul>\n </li>\n</ol>\n<p>\nFor example, if asked <em>“What film won Best Picture this year?”</em>, the system might retrieve information about <em>“Anora”</em> winning the award, even if this occurred after the model’s original training.</p>\n<p>\nA key advantage of RAG is its modularity - components like the vector database, embedding model, or LLM can be swapped independently without rebuilding the entire system.</p>\n<h3>\nCache Augmented Generation (CAG)</h3>\n<p>\nCAG takes a fundamentally different approach:</p>\n<ul>\n <li>\nInstead of retrieving knowledge on demand, CAG preloads all available </li>\n <li>\nThe entire knowledge corpus is formatted into one massive prompt that fits </li>\n <li>\nThe LLM processes this extensive input in a single forward pass </li>\n <li>\nThe model’s internal state is captured in what’s called a “KV cache” </li>\n <li>\nWhen a user query arrives, it’s added to this pre-existing KV cache </li>\n <li>\nThe model can access any relevant information from the cache without <br>\ninformation into the model’s context window within the model’s context limits (key-value cache) reprocessing the entire knowledge base </li>\n</ul>\n<p>\nThe fundamental distinction: RAG fetches only what it predicts is needed, while CAG loads everything upfront and remembers it for later use.</p>\n<h2>\nComparing Capabilities</h2>\n<h3>\nAccuracy</h3>\n<ul>\n <li>\n<strong>RAG</strong>: Accuracy depends heavily on the retriever component. If the retriever </li>\n <li>\n<strong>CAG</strong>: Guarantees that all information is available (assuming it exists in <br>\nfails to fetch relevant documents, the LLM won’t have the facts needed to answer correctly. the knowledge base), but places the burden on the LLM to extract the right information from a large context. </li>\n</ul>\n<h3>\nLatency</h3>\n<ul>\n <li>\n<strong>RAG</strong>: Higher latency due to additional steps of embedding the query, </li>\n <li>\n<strong>CAG</strong>: Lower latency once knowledge is cached, as answering queries requires <br>\nsearching the index, and processing retrieved text. only one forward pass without retrieval lookup time. </li>\n</ul>\n<h3>\nScalability</h3>\n<ul>\n <li>\n<strong>RAG</strong>: Can scale to millions of documents as only a small portion is </li>\n <li>\n<strong>CAG</strong>: Limited by the model’s context window size (typically ~32k-100k <br>\nretrieved per query. tokens), restricting it to a few hundred documents at most. </li>\n</ul>\n<h3>\nData Freshness</h3>\n<ul>\n <li>\n<strong>RAG</strong>: Easy to update incrementally as you add new document embeddings or </li>\n <li>\n<strong>CAG</strong>: Requires recomputation when data changes, making it less suitable for <br>\nremove outdated ones. frequently updated information. </li>\n</ul>\n<h2>\nWhen to Use Each Approach</h2>\n<p>\nThe video presents several scenarios to illustrate when each approach is more appropriate:</p>\n<ol>\n <li>\n<strong>IT Help Desk Bot with Static Manual (200 pages, rarely updated)</strong> </li>\n <li>\n<strong>Legal Research Assistant (Thousands of constantly updated documents)</strong> </li>\n <li>\n**Clinical Decision Support System (Patient records, treatment guides, drug <ul>\n <li>\n<strong>Best Choice</strong>: CAG - <strong>Rationale</strong>: Knowledge base is small enough to fit in most LLM context windows, information is static, and caching enables faster query responses. - <strong>Best Choice</strong>: RAG - <strong>Rationale</strong>: Knowledge base is massive and dynamic, precise citations are required, and incremental updates are essential. interactions)<strong> - </strong>Best Choice<strong>: Hybrid Approach - </strong>Rationale**: Use RAG to retrieve relevant subsets from the massive knowledge base, then load that retrieved content into a long-context model using CAG for follow-up questions. </li>\n </ul>\n </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nThe choice between RAG and CAG ultimately depends on your specific use case. Consider RAG when dealing with large or frequently updated knowledge sources, when citations are necessary, or when resources for running long-context models are limited. CAG is preferable when working with a fixed knowledge set that fits within your model’s context window, when low latency is crucial, or when you want to simplify deployment.\\ As LLM technology evolves with expanding context windows and improved retrieval mechanisms, we may see these approaches converge or new hybrid solutions emerge. For now, understanding the strengths and limitations of both RAG and CAG allows AI engineers to make informed decisions about knowledge augmentation strategies that best suit their specific applications.</p>\n",
"tags": [
"rag",
"llm",
"ai",
"machine-learning",
"prompt-engineering",
"nlp",
"data-processing",
"best-practices"
]
},
{
"date": "2025-03-15",
"title": "🤖 The State of AI Agents in 2025",
"url": "/posts/navigating-ais-frontier-2025.html",
"content": "<p>\n<strong>TL;DR:</strong> Despite significant advancements creating a “perfect storm” for AI agents in 2025, truly autonomous systems still face five categories of cumulative errors that prevent reliable performance; overcoming these challenges requires focused strategies in data curation, robust evaluation frameworks, scaffolding systems, distinctive user experiences, and multimodal approaches.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThe AI landscape has evolved at a breathtaking pace over the past few years, with autonomous AI agents being positioned as the next revolutionary frontier. At the 2025 AI Engineer Summit, Grace Isford, a partner at Lux Capital, delivered an <a href=\"https://www.youtube.com/watch?v=HS5a8VIKsvA\">insightful keynote</a> on “The State of the AI Frontier” that challenged the prevailing narrative about AI agents. While many industry players proclaim that 2025 marks the “perfect storm” for AI agents, Isford’s presentation offered a more nuanced view, highlighting both the tremendous progress and the significant challenges that remain. This article summarises the key insights from her keynote, examining the current state of AI agents and the strategies developers can employ to overcome persistent limitations.</p>\n<h2>\nThe Perfect Storm for AI Agents</h2>\n<p>\nThe speaker began by acknowledging the remarkable progress in AI over the past two and a half years. The industry has seen exponential advancements since the release of Stable Diffusion in August 2022, with the pace of innovation only accelerating. 2025 has already witnessed several landmark developments:</p>\n<ul>\n <li>\nThe announcement of the $500 billion Stargate project collaboration between </li>\n <li>\nOpenAI’s o3 model exceeding human performance in the Arc AGI challenge </li>\n <li>\nDeepSeq’s R1 model launch causing market disruptions and reaching the top of </li>\n <li>\nFrance’s new AI initiative announced at the France AI Summit, bringing Europe <br>\nthe U.S. government, OpenAI, SoftBank, and Oracle the App Store back into the global AI race </li>\n</ul>\n<p>\nThese developments, alongside other factors, have created what many call the “perfect storm” for AI agents:</p>\n<ol>\n <li>\nReasoning models (like OpenAI’s o1 and o3, DeepSeq’s R1, and Grok’s latest </li>\n <li>\nIncreased test-time compute (more resources allocated to inference rather </li>\n <li>\nEngineering and hardware optimisations driving efficiency </li>\n <li>\nCheaper inference and hardware costs </li>\n <li>\nA narrowing gap between open-source and closed-source models </li>\n <li>\nMassive infrastructure investments from governments and corporations <br>\noffering) now outperform humans in various benchmarks than just training) worldwide </li>\n</ol>\n<h2>\nThe Reality Gap: Why AI Agents Aren’t Quite Working Yet</h2>\n<p>\nDespite this promising landscape, Isford argued that truly autonomous AI agents aren’t functioning as seamlessly as industry hype suggests. To illustrate this point, she shared a real-world example of trying to use OpenAI’s operator to book a flight from New York to San Francisco with specific requirements. Despite seemingly straightforward criteria (departure time after 3 PM, avoiding rush hour, specific airlines, budget constraints, seat preferences), the agent failed to deliver a satisfactory result.</p>\n<p>\nThe presenter identified five categories of cumulative errors that prevent AI agents from delivering consistent, reliable results:</p>\n<ol>\n <li>\n<strong>Decision Errors</strong>: Choosing incorrect facts or overthinking/exaggerating </li>\n <li>\n<strong>Implementation Errors</strong>: Encountering access issues or integration failures </li>\n <li>\n<strong>Heuristic Errors</strong>: Applying wrong criteria or missing critical contextual </li>\n <li>\n<strong>Taste Errors</strong>: Failing to account for personal preferences not explicitly </li>\n <li>\n<strong>Perfection Paradox</strong>: User expectations heightened by AI’s capabilities in <br>\nscenarios (like CAPTCHA challenges) information stated some areas lead to frustration when agents perform at merely human speed or make basic errors </li>\n</ol>\n<p>\nThese errors compound dramatically in complex multi-agent systems with multi-step tasks. Isford presented a compelling visual example showing how even agents with impressive 99% and 95% accuracy rates drop to 60% and 8% reliability respectively after just 50 consecutive steps.</p>\n<h2>\nFive Strategies for Building Better AI Agents</h2>\n<p>\nThe keynote then shifted to offering concrete strategies for mitigating these challenges and building more effective AI agents:</p>\n<h3>\n1. Data Curation</h3>\n<ul>\n <li>\nRecognise that data is increasingly diverse (text, images, video, audio, </li>\n <li>\nCurate proprietary data, including data generated by the agent itself </li>\n <li>\nDesign “data flywheels” that automatically improve agent performance through </li>\n <li>\nRecycle and adapt to user preferences in real-time <br>\nsensor data) user interactions </li>\n</ul>\n<h3>\n2. Robust Evaluation Systems</h3>\n<ul>\n <li>\nMove beyond evaluations for verifiable domains (math, science) to develop </li>\n <li>\nCollect signals about human preferences </li>\n <li>\nBuild personalised evaluation systems that reflect actual user needs </li>\n <li>\nSometimes the best evaluation is direct human testing rather than relying <br>\nframeworks for subjective assessments solely on benchmarks </li>\n</ul>\n<h3>\n3. Scaffolding Systems</h3>\n<ul>\n <li>\nImplement safeguards to prevent cascading failures when errors occur </li>\n <li>\nBuild complex compound systems that can work together harmoniously </li>\n <li>\nIncorporate human intervention at critical junctures </li>\n <li>\nDevelop self-healing agents that can recognise their own mistakes and correct <br>\ncourse </li>\n</ul>\n<h3>\n4. User Experience as a Competitive Moat</h3>\n<ul>\n <li>\nRecognise that UX differentiation is crucial when most applications are using </li>\n <li>\nDeeply understand user workflows to create elegant human-machine collaboration </li>\n <li>\nIntegrate seamlessly with existing systems to deliver tangible ROI </li>\n <li>\nFocus on industries with proprietary data sources and specialised workflows <br>\nthe same foundation models (robotics, manufacturing, life sciences) </li>\n</ul>\n<h3>\n5. Multimodal Approaches</h3>\n<ul>\n <li>\nMove beyond basic chatbot interfaces to create more human-like experiences </li>\n <li>\nIncorporate multiple sensory capabilities (vision, voice, and potentially </li>\n <li>\nBuild personal memory systems that understand users on a deeper level </li>\n <li>\nTransform inconsistent but visionary products into experiences that exceed <br>\ntouch or smell) expectations through novel interfaces </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nWhile 2025 has created what appears to be a perfect storm for AI agents with advanced reasoning models, increased compute efficiency, and massive infrastructure investments, the reality is that autonomous AI agents still face significant challenges. The cumulative effect of small errors across decision-making, implementation, heuristics, and user preferences creates substantial reliability issues in complex agent systems.</p>\n<p>\nHowever, as this keynote emphasised, these challenges are not insurmountable. By focusing on meticulous data curation, developing sophisticated evaluation frameworks, implementing robust scaffolding systems, prioritising distinctive user experiences, and embracing multimodal approaches, developers can build AI agents that deliver on their transformative potential. The lightning strike of truly autonomous, reliable AI agents may not have happened yet, but with these strategies, the industry is moving steadily toward that breakthrough moment.</p>\n",
"tags": [
"ai",
"machine-learning",
"llm",
"best-practices",
"evaluation",
"prompt-engineering",
"decision-making"
]
},
{
"date": "2025-02-11",
"title": "🗄️ SQLite: The Minimalist Database for AI Engineering",
"url": "/posts/sqlite-minimalist-choice-for-ai-engineering.html",
"content": "<p>\n<strong>TL;DR:</strong> SQLite offers a zero-configuration, pre-installed database solution ideal for AI engineering projects, supporting modern data structures including vectors, graphs, and JSON documents whilst providing single-file portability, ACID compliance, and broad language compatibility- making it an excellent minimalist choice when specialised database systems would be overkill.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nIn today’s AI engineering landscape, choosing the right database can feel overwhelming. While specialised solutions like <a href=\"https://qdrant.tech/\">Qdrant</a> (vectors), <a href=\"https://neo4j.com/\">Neo4j</a> (graphs), and <a href=\"https://www.mongodb.com/\">MongoDB</a> (documents) excel in their niches, there’s a compelling case for <a href=\"https://www.sqlite.org/index.html\">SQLite</a> as a versatile, minimalist solution that comes pre-installed on most systems and supports multiple data structures effectively. Speaking of minimalism, <a href=\"https://github.com/tconbeer/harlequin\">Harlequin</a> (named after a <a href=\"https://en.wikipedia.org/wiki/Harlequin_duck\">sea 🦆</a>) makes data exploration very enjoyable. Credit for the SQLite idea goes to <a href=\"https://bsky.app/profile/simonwillison.net\">Simon Willison</a>, a prolific AI researcher among others, who has been posting <a href=\"https://simonwillison.net/tags/sqlite/\">blog articles</a> and <a href=\"https://til.simonwillison.net/sqlite\">TILs</a> (Today I Learned) about it since 2003!</p>\n<h2>\nThe Power of Pre-installation</h2>\n<p>\nSQLite’s ubiquity is remarkable. It comes pre-installed on:</p>\n<ul>\n <li>\nmacOS </li>\n <li>\nMost Linux distributions (including Ubuntu, as evidenced by its </li>\n <li>\nPython’s standard library </li>\n <li>\nAndroid devices </li>\n <li>\niOS devices <br>\n<a href=\"https://releases.ubuntu.com/24.10/ubuntu-24.10-desktop-amd64.manifest\">manifest</a>) </li>\n</ul>\n<p>\nThis universal availability means you can start developing immediately without additional setup or installation steps.</p>\n<h2>\nModern Data Structure Support</h2>\n<p>\nDespite its lightweight nature, SQLite handles modern data structures surprisingly well:</p>\n<ol>\n <li>\n<strong>Vector Storage</strong>[^1] </li>\n</ol>\n<pre><code class=\"sql\">CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])</code></pre>\n<pre><code class=\"sql\">-- vectors can be provided as JSON or in a compact binary format\nINSERT INTO vec_items(rowid, embedding)\n VALUES\n (1, '[-0.200, 0.250, 0.341, -0.211, 0.645, 0.935, -0.316, -0.924]'),\n (2, '[0.443, -0.501, 0.355, -0.771, 0.707, -0.708, -0.185, 0.362]'),\n (3, '[0.716, -0.927, 0.134, 0.052, -0.669, 0.793, -0.634, -0.162]'),\n (4, '[-0.710, 0.330, 0.656, 0.041, -0.990, 0.726, 0.385, -0.958]');</code></pre>\n<pre><code class=\"sql\">-- KNN-style query\nSELECT\n rowid,\n distance\nFROM vec_items\nWHERE embedding MATCH '[0.890, 0.544, 0.825, 0.961, 0.358, 0.0196, 0.521, 0.175]'\nORDER BY distance\nLIMIT 3</code></pre>\n<ol start=\"2\">\n <li>\n<strong>Graph Relationships</strong>[^2] </li>\n</ol>\n<pre><code class=\"sql\">-- Create table `nodes`\nCREATE TABLE IF NOT EXISTS nodes (\n id TEXT PRIMARY KEY,\n properties TEXT\n)</code></pre>\n<pre><code class=\"sql\">-- Create table `edges`\nCREATE TABLE IF NOT EXISTS edges (\n source TEXT,\n target TEXT,\n relationship TEXT,\n weight REAL,\n PRIMARY KEY (source, target, relationship),\n FOREIGN KEY (source) REFERENCES nodes(id),\n FOREIGN KEY (target) REFERENCES nodes(id)\n)</code></pre>\n<pre><code class=\"sql\">-- Create indices of the `edges` between `source` and `target`, for improved performance\nCREATE INDEX IF NOT EXISTS source_idx ON edges(source)\nCREATE INDEX IF NOT EXISTS target_idx ON edges(target)</code></pre>\n<pre><code class=\"sql\">-- Count the no. of incoming and outgoing edges per node, known as 'degree centrality'\nSELECT id,\n (SELECT COUNT(*) FROM edges WHERE source = nodes.id) +\n (SELECT COUNT(*) FROM edges WHERE target = nodes.id) as degree\nFROM nodes\nORDER BY degree DESC\nLIMIT 10</code></pre>\n<ol start=\"3\">\n <li>\n<strong>Document Storage</strong> </li>\n</ol>\n<pre><code class=\"sql\">CREATE TABLE documents (\n id INTEGER PRIMARY KEY,\n content JSON,\n metadata JSON\n);</code></pre>\n<h2>\nPortability and Simplicity</h2>\n<p>\nOne of SQLite’s strongest features is its <a href=\"https://www.sqlite.org/onefile.html\">single-file</a> nature. Your entire database exists in one file that can be:</p>\n<ul>\n <li>\nBacked up with a simple copy operation </li>\n <li>\nEasily version controlled (for smaller databases) </li>\n <li>\nMoved between systems effortlessly </li>\n <li>\nExamined with standard SQLite tools </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nWhile specialised databases have their place, SQLite offers a compelling combination of features that make it ideal for many AI engineering projects:</p>\n<ul>\n <li>\nZero configuration </li>\n <li>\nPre-installed availability </li>\n <li>\nSupport for multiple data structures </li>\n <li>\nSingle-file portability </li>\n <li>\nWide language support, especially in Python and Go </li>\n <li>\nACID[^3] compliance </li>\n</ul>\n<p>\n<strong>TL;DR</strong>: When you need a lightweight, self-contained database that can handle documents, vectors, and graphs without the complexity of a full database server, SQLite is often an excellent choice.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Example from</p>\n<pre><code>[sqlite-vec with Python](https://alexgarcia.xyz/sqlite-vec/python.html)</code></pre>\n<p>\n[^2]: Examples from</p>\n<pre><code>[How to Build Lightweight GraphRAG with SQLite](https://dev.to/stephenc222/how-to-build-lightweight-graphrag-with-sqlite-53le)</code></pre>\n<p>\n[^3]: Atomicity, Consistency, Isolation, Durability</p>\n<pre><code>([ACID](https://en.wikipedia.org/wiki/ACID)), per Wikipedia, "_is a set of\nproperties of database transactions intended to guarantee data validity\ndespite errors, power failures, and other mishaps. In the context of\ndatabases, a sequence of database operations that satisfies the ACID\nproperties (which can be perceived as a single logical operation on the\ndata) is called a transaction. For example, a transfer of funds from one\nbank account to another, even involving multiple changes such as debiting\none account and crediting another, is a single transaction._"</code></pre>\n",
"tags": [
"ai",
"data-modeling",
"data-processing",
"data-science",
"minimal",
"production",
"python",
"zero-config"
]
},
{
"date": "2025-02-09",
"title": "💡 TIL: To Prepare for AI, Study History's Tech Cycles",
"url": "/posts/TIL-prepare-for-ai.html",
"content": "<p>\n<strong>TL;DR:</strong> Fast.ai founder Jeremy Howard advocates studying historical technology cycles rather than attempting to predict AI’s future, recommending a practical preparation strategy that combines domain expertise with AI capabilities through self-directed learning, side projects, and community engagement- emphasising that success will come from embracing uncertainty whilst pursuing counter-cyclical opportunities.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\n<a href=\"https://jeremy.fast.ai/\">Jeremy Howard</a> isn’t just another voice in the AI conversation. As the creator of <a href=\"https://towardsdatascience.com/understanding-ulmfit-and-elmo-the-shift-towards-transfer-learning-in-nlp-b5d8e2e3f664\">ULMFiT</a> (the algorithm that modern LLMs like ChatGPT are based on), founding researcher at <a href=\"https://course.fast.ai/\">fast.ai</a>, and <a href=\"https://www.answer.ai/\">Answer.AI</a>, Howard brings a unique perspective shaped by decades at the forefront of AI development. Recently, when <a href=\"https://xcancel.com/chrisbarber/status/1888037803566747942\">asked about preparing for AI</a>, his response wasn’t about futuristic predictions or doomsday scenarios. Instead, he offered something more valuable: practical wisdom drawn from historical patterns.</p>\n<h2>\nWhy This Matters Now</h2>\n<p>\nWe’re at a critical juncture with AI, similar to where we were with the internet in 1990. Just as the internet transformed every aspect of our lives, AI is poised to do the same. The difference? We can learn from history this time. Howard’s insights are particularly valuable because they come from someone who has not only observed but shaped these technological transitions.</p>\n<h2>\nKey Insights on Technology Evolution</h2>\n<p>\nHoward emphasises a crucial pattern: technology doesn’t just grow linearly. Each innovation follows a “hockey stick” growth curve before flattening into a sigmoid.</p>\n<center>\n <figure> <img src=\"https://raw.githubusercontent.com/ai-mindset/ai-mindset.github.io/0a6eced3bce4c70b7ba715fe7873d1659ce2e9a9/images/hockey-stick-growth.png\" width=\"80%\" height=\"80%\" /> <figcaption>Hockey stick growth</figcaption> </figure></center>\n<p>\nMore importantly, new “hockey sticks” emerge unexpectedly in different areas. This pattern repeats “like clockwork” making historical understanding more valuable than future predictions.</p>\n<h2>\nPractical Preparation Strategy</h2>\n<p>\nRather than trying to predict AI’s future, Howard advocates for:</p>\n<ul>\n <li>\nEmbracing uncertainty while avoiding both dismissive fear and blind hype </li>\n <li>\nTaking a counter-cyclical approach: pursuing opportunities others overlook </li>\n <li>\nInvesting months in mastering AI tools, accepting initial poor results as part </li>\n <li>\nCombining AI capabilities with deep domain expertise </li>\n <li>\nBuilding practical knowledge through side projects and community engagement <br>\nof the learning process </li>\n</ul>\n<h2>\nThe Education Perspective</h2>\n<p>\nHoward challenges traditional educational paths, suggesting alternatives:</p>\n<ul>\n <li>\nSelf-directed learning through resources like </li>\n <li>\nMultiple side hustles to build practical experience </li>\n <li>\nCommunity building with like-minded innovators </li>\n <li>\nUsing AI itself to learn technical skills </li>\n <li>\nDeveloping both technical and human skills as a generalist <br>\n<a href=\"https://course.fast.ai/\">fast.ai</a> </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nThe key takeaway isn’t about predicting AI’s future -it’s about preparing for it intelligently. Howard’s message is: success in the AI era won’t come from perfect predictions or traditional career paths. Instead, it will come from practical engagement, continuous learning, and the ability to combine domain expertise with AI capabilities. As he puts it, those who master this combination will have “superpowers” compared to those who don’t adapt.\\ The most valuable insight? Even AI experts can’t predict AI’s future reliably. The best strategy is to engage deeply with the technology while maintaining a grounded, practical approach to learning and application. The future belongs to the tinkerers, the experimenters, and those willing to learn from both past and present.</p>\n",
"tags": [
"til",
"ai",
"fast-ai",
"llm",
"machine-learning",
"best-practices",
"decision-making",
"evolution"
]
},
{
"date": "2025-01-26",
"title": "🚀 A Minimal, Pragmatic Approach to Production-Ready AI & ML with Go",
"url": "/posts/go-pragmatic-modern-development.html",
"content": "<p>\n<strong>TL;DR:</strong> Go offers a refreshingly minimal approach to AI and ML development with its concise 47-page specification, zero-configuration toolchain, and functional equivalents to key Python ML libraries- providing explicit error handling, enforced code consistency, and cross-platform capabilities whilst reducing cognitive overhead and team friction in production environments.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nModern software development often involves navigating complex toolchains, opinionated frameworks, and resource-heavy development environments. Many languages require extensive configuration, multiple runtime dependencies, and introduce significant cognitive overhead through their vast feature sets and multiple approaches to solving the same problem. Node.js, JVM languages, and even Python with its extensive ecosystem can lead to analysis paralysis, code inhomogeneity and team disagreements over tooling and style.\\ Go offers a refreshing alternative. With a language specification under 50 pages, a consolidated toolchain, and a “batteries included” approach, it provides a low-cognitive-overhead solution for developers seeking simplicity and productivity. Its zero-config philosophy, coupled with built-in formatting (<code class=\"inline\">go fmt</code>), linting[^1] (<code class=\"inline\">go vet</code>), and testing tools, promotes code uniformity and reduces team friction over stylistic choices. The sizeable Go community is centralised, using Slack in this case, which serves as a focal point for communication, support, networking, and staying informed about the latest developments.\\ While Go may lack a REPL as sophisticated as IPython or the Julia interactive environment, this limitation encourages proper Test-Driven Development practices rather than the post-implementation testing often seen in REPL-heavy environments. Tools like <a href=\"https://github.com/fatih/vim-go\">vim-go</a>‘s <code class=\"inline\">:GoRun</code> and Go Playground provide sufficient interactive development capabilities for most use cases.\\ Below I’m collecting some thoughts on attractive aspects of Go I’ve discerned so far and how they compare with other languages I’ve considered. The list of Go’s features is far from complete, for example I’ve not mentioned goroutines among others.</p>\n<h2>\nPython vs Go Libraries Comparison</h2>\n<p>\n| Domain | Python Library | Go Equivalent | | ——————- | ——————————————————————————————————- | ———————————————————————————— | | Numerical Computing | <a href=\"https://github.com/numpy/numpy\">NumPy</a> | <a href=\"https://github.com/gonum/gonum\">gonum</a> | | Data Processing | <a href=\"https://github.com/pandas-dev/pandas\">Pandas</a> | <a href=\"https://github.com/go-gota/gota\">gota</a> | | Visualisation | <a href=\"https://github.com/plotly/plotly.py\">Plotly</a> | <a href=\"https://github.com/MetalBlueberry/go-plotly\">go-plotly</a> | | Gradient Boosting | <a href=\"https://github.com/dmlc/xgboost\">XGBoost</a> | <a href=\"https://github.com/Unity-Technologies/go-xgboost\">go-xgboost</a> | | Machine Learning | <a href=\"https://github.com/scikit-learn/scikit-learn\">Scikit-Learn</a> | <a href=\"https://github.com/sjwhitworth/golearn\">golearn</a> | | Deep Learning | <a href=\"https://github.com/tensorflow/tensorflow\">TensorFlow</a><br><a href=\"https://github.com/pytorch/pytorch\">PyTorch</a> | <a href=\"https://github.com/galeone/tfgo\">tfgo</a><br><a href=\"https://github.com/sugarme/gotch\">gotch</a> | | LLM Development | <a href=\"https://github.com/langchain-ai/langchain\">LangChain</a> | <a href=\"https://github.com/tmc/langchaingo\">langchaingo</a> | | Vector Search | <a href=\"https://github.com/weaviate/weaviate-python-client\">Weaviate Client</a> | <a href=\"https://github.com/weaviate/weaviate-python-client\">Weaviate Go Client</a> |</p>\n<p>\n<em>Update: <a href=\"https://github.com/Promacanthus/awesome-golang-ai\">Awesome Golang.ai</a> is a very nice curated list of AI-related Go libraries worth checking.</em></p>\n<h2>\nDevelopment Experience</h2>\n<p>\nGo’s tooling is exceptional. With <a href=\"https://github.com/fatih/vim-go\">vim-go</a> in <a href=\"https://neovim.io/\">Neovim</a>, you get immediate access to formatting, linting, and code navigation. Unlike JVM languages or JavaScript frameworks that may require more complex build configurations, Go projects maintain a simple, predictable structure thanks to <code class=\"inline\">go mod</code>. The <code class=\"inline\">go fmt</code> command -triggered on save by default- enforces consistent code style eliminating debates over formatting and best practices, while <code class=\"inline\">go vet</code> catches common mistakes early.</p>\n<h2>\nError Handling Done Right</h2>\n<p>\nGo’s approach to error handling initially feels verbose:</p>\n<pre><code class=\"go\">result, err := someFunction()\nif err != nil {\n return err\n}</code></pre>\n<p>\nBut this explicitness pays dividends. By treating errors as values that must be handled, Go forces developers to think about failure cases upfront. The <code class=\"inline\">defer</code> keyword complements this by ensuring clean-up code runs regardless of errors:</p>\n<pre><code class=\"go\">file, err := os.Open("data.txt")\nif err != nil {\n return err\n}\ndefer file.Close()</code></pre>\n<h2>\nML/AI Capabilities</h2>\n<p>\nWhile Go isn’t the primary choice for ML/AI experimentation, its simplicity and performance make it excellent for production deployments. Its standard library and growing ecosystem provide solid foundations for numerical computing (<a href=\"https://github.com/gonum/gonum\">gonum</a>), data processing (<a href=\"https://github.com/go-gota/gota\">gota</a>), and ML/AI applications (<a href=\"https://github.com/gorgonia/gorgonia\">Gorgonia</a>, <a href=\"https://github.com/galeone/tfgo\">tfgo</a>, <a href=\"https://github.com/sugarme/gotch\">gotch</a>). The language’s focus on simplicity and performance makes it particularly suitable for model serving and inference workloads.</p>\n<h2>\nLanguage Design</h2>\n<p>\nGo’s refreshingly concise specification (under 50 pages) contrasts sharply with other languages. Even the highly promising Zig, a younger language half of Go’s age, has a 74-page specification despite being positioned as a simpler low-level language.</p>\n<figure>\n <img src=\"https://raw.githubusercontent.com/ai-mindset/ai-mindset.github.io/refs/heads/main/images/Zig%20language%20spec.png\" width=\"80%\" height=\"80%\"/> <figcaption>Zig's language spec</figcaption></figure>\n<p>\nGo’s intentionally limited feature set and single way of solving problems promote maintainable, uniform code that’s easier to reason about and review, as reflected in its compact language spec.</p>\n<figure>\n <img src=\"https://raw.githubusercontent.com/ai-mindset/ai-mindset.github.io/refs/heads/main/images/Go%20language%20spec.png\" width=\"80%\" height=\"80%\"/> <figcaption>Go's language spec</figcaption></figure>\n<p>\nFor ML engineers and developers seeking a reliable, low-overhead language that excels at building robust, production-ready applications, Go offers a compelling choice. While it won’t replace Python for rapid prototyping and research, its simplicity, performance, and consolidated toolchain make it an very compelling addition to any developer’s toolkit.</p>\n<h2>\nConclusion</h2>\n<p>\nTo me, Go stands out as a pragmatic choice for modern development through its key strengths:</p>\n<ul>\n <li>\nMinimal cognitive overhead with a 47-page specification </li>\n <li>\nZero-config toolchain including formatting, testing, and package management </li>\n <li>\nCentralised community, providing a single-source of truth </li>\n <li>\nEnforced error handling and clean resource management via <code class=\"inline\">defer</code> </li>\n <li>\nGrowing ML/AI ecosystem comparable to Python’s established libraries </li>\n <li>\nCross-platform compilation and efficient garbage collection </li>\n <li>\nSingle, clear way to solve problems, reducing team friction </li>\n <li>\nLightweight development environment compared to JVM, .NET, BEAM or Node.js </li>\n</ul>\n<p>\nWhile Python remains dominant for ML/AI research, prototyping and -frequently- production, Go excels in production environments where code maintainability, performance, and team collaboration are crucial. Its intentionally limited feature set, combined with a comprehensive standard library and maturing ML ecosystem, makes it a very attractive choice for developers seeking simplicity without sacrificing capability.\\ The language’s design philosophy strongly aligns with my needs as a Data professional looking to reduce tooling complexity and maintain consistent, reliable codebases. Go’s lightweight yet rich toolchain allows writing safe, efficient AI and data-oriented code based on simplicity and reliability. This refreshing alternative in today’s complex development landscape has strongly tempted me to start moving my practice to Go’s more principled approach.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Vet is -in essence- a linter, since it helps improve code quality. Quoting</p>\n<pre><code>Go's [vet doc](https://go.dev/src/cmd/vet/doc.go) _"Vet examines Go source\ncode and reports suspicious constructs, such as Printf calls whose arguments\ndo not align with the format string. Vet uses heuristics that do not\nguarantee all reports are genuine problems, but it can find errors not\ncaught by the compilers."_</code></pre>\n",
"tags": [
"ai",
"go",
"llm",
"minimal",
"machine-learning",
"toolchain",
"zero-config",
"code-quality",
"cross-platform",
"production"
]
},
{
"date": "2025-01-21",
"title": "🔧 A 5-Minute Guide to Engineering Machine Learning Systems",
"url": "/posts/ml-best-practices.html",
"content": "<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThis is a concise reference guide distilling Martin Zinkevich’s <a href=\"https://developers.google.com/machine-learning/guides/rules-of-ml\">influential Google article on machine learning best practices</a>. While the original spans 43 detailed rules, this 10-minute summary captures the essential principles for building production ML systems. Whether you’re starting a new project or reviewing an existing one, this summary can be used as a practical checklist for engineering-focused machine learning.</p>\n<h2>\nCore Philosophy</h2>\n<blockquote>\n <p>\nDo machine learning like the great engineer you are, not like the great > machine learning expert you aren’t. </p>\n</blockquote>\n<p>\nMost ML gains come from great features, not algorithms. The basic approach should be:</p>\n<ol>\n <li>\nEnsure solid end-to-end pipeline </li>\n <li>\nStart with reasonable objective </li>\n <li>\nAdd common-sense features simply </li>\n <li>\nMaintain pipeline integrity </li>\n</ol>\n<h2>\nPhase I: Before Machine Learning (Rules #1-3)</h2>\n<ol>\n <li>\n <p>\n<strong>Don’t be afraid to launch without ML</strong> </p>\n <ul>\n <li>\nSimple heuristics get you 50% of the way - Launch with heuristics when data is insufficient - Example: Use install rate for app ranking </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>First, design and implement metrics</strong> </p>\n <ul>\n <li>\nTrack everything possible in current system - Get early permission from users - Design systems with metric instrumentation - Implement experiment framework </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Choose ML over complex heuristics</strong> </p>\n <ul>\n <li>\nSimple heuristics for launching - Complex heuristics become unmaintainable - ML models are easier to maintain long-term </li>\n </ul>\n </li>\n</ol>\n<h2>\nPhase II: First Pipeline (Rules #4-11)</h2>\n<ol>\n <li>\n <p>\n<strong>Keep first model simple, get infrastructure right</strong> </p>\n <ul>\n <li>\nFocus on data pipeline integrity - Define clear evaluation metrics - Plan model integration carefully </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Pipeline Health is Critical</strong> </p>\n <ul>\n <li>\nTest infrastructure independently - Monitor freshness requirements - Watch for silent failures - Give feature columns owners - Document feature expectations </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Starting Your ML System</strong> </p>\n <ul>\n <li>\nTest getting data into algorithm - Test getting models out correctly - Monitor data statistics continuously - Build alerting system </li>\n </ul>\n </li>\n</ol>\n<h2>\nYour First Objective (Rules #12-15)</h2>\n<ol>\n <li>\n <p>\n<strong>Choose Objectives Wisely</strong> </p>\n <ul>\n <li>\nDon’t overthink initial objective choice - Start with simple, observable metrics - Use directly observed user behaviours - Example: clicks, downloads, shares </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Model Selection Guidelines</strong> </p>\n <ul>\n <li>\nStart with interpretable models - Separate spam filtering from quality ranking - Use simple linear models initially - Make debugging easier </li>\n </ul>\n </li>\n</ol>\n<h2>\nPhase III: Feature Engineering (Rules #16-22)</h2>\n<ol>\n <li>\n <p>\n<strong>Plan to launch and iterate</strong> </p>\n <ul>\n <li>\nExpect regular model updates - Design for feature flexibility - Keep infrastructure clean </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Feature Engineering Principles</strong> </p>\n <ul>\n <li>\nStart with directly observed features - Use cross-product features wisely - Clean up unused features - Scale feature complexity with data </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Feature Coverage and Quality</strong> </p>\n <ul>\n <li>\nFeatures that generalise across contexts - Monitor feature coverage - Document feature ownership - Regular feature clean-up </li>\n </ul>\n </li>\n</ol>\n<h2>\nHuman Analysis (Rules #23-28)</h2>\n<ol>\n <li>\n <p>\n<strong>Testing and Validation</strong> </p>\n <ul>\n <li>\nUse crowdsourcing or live experiments - Measure model deltas explicitly - Look for error patterns - Consider long-term effects </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Common Pitfalls</strong> </p>\n <ul>\n <li>\nEngineers aren’t typical users - Beware of confirmation bias - Quantify undesirable behaviours </li>\n </ul>\n </li>\n</ol>\n<h2>\nTraining-Serving Skew (Rules #29-37)</h2>\n<ol>\n <li>\n <p>\n<strong>Prevent Skew</strong> </p>\n <ul>\n <li>\nSave serving-time features - Weight sampled data properly - Reuse code between training/serving - Test on future data </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Monitor Everything</strong> </p>\n <ul>\n <li>\nTrack performance metrics - Watch data distributions - Monitor feature coverage - Check prediction bias </li>\n </ul>\n </li>\n</ol>\n<h2>\nPhase IV: Optimisation and Complex Models (Rules #38-43)</h2>\n<ol>\n <li>\n <p>\n<strong>When to Add Complexity</strong> </p>\n <ul>\n <li>\nAfter simple approaches plateau - When objectives are well-aligned - If maintenance cost justifies gains </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Advanced Techniques</strong> </p>\n <ul>\n <li>\nKeep ensembles simple - Look for new information sources - Balance complexity vs. benefits </li>\n </ul>\n </li>\n</ol>\n<h2>\nFinal Recommendations</h2>\n<ol>\n <li>\n <p>\n<strong>Launch Decisions</strong> </p>\n <ul>\n <li>\nConsider multiple metrics - Use proxies for long-term goals - Balance simple vs. complex </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>System Evolution</strong> </p>\n <ul>\n <li>\nStart simple, add complexity gradually - Monitor consistently - Keep infrastructure clean - Document everything </li>\n </ul>\n </li>\n</ol>\n",
"tags": [
"machine-learning",
"best-practices",
"mlops",
"monitoring",
"production",
"quality-assurance",
"data-science",
"decision-making"
]
},
{
"date": "2025-01-14",
"title": "🤖 Understanding AI Agents: Tools, Planning, and Evaluation",
"url": "/posts/agents-chip-huyen.html",
"content": "<p>\n<strong>TL;DR:</strong> Chip Huyen’s analysis of AI agents explores how they combine foundation models with specialised tools (knowledge augmentation, capability extension, and write actions), planning mechanisms (ReAct, Reflexion), and evaluation frameworks to accomplish complex tasks whilst highlighting challenges in tool selection, planning efficiency, and error management.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThis article summarises Chip Huyen’s comprehensive blog post “<a href=\"https://huyenchip.com//2025/01/07/agents.html\">Agents</a>“ adapted from her upcoming book AI Engineering (2025). The original piece provides an in-depth examination of intelligent agents, which represent a fundamental concept in AI, defined by Russell and Norvig in their seminal 1995 book <a href=\"https://en.wikipedia.org/wiki/Artificial_Intelligence:_A_Modern_Approach\">Artificial Intelligence: A Modern Approach</a> as anything that can perceive its environment through sensors and act upon it through actuators. Huyen explores how the unprecedented capabilities of foundational models have transformed theoretical possibilities into practical applications, enabling agents to operate in diverse environments -from digital workspaces for coding to physical settings for robotics. These agents can now assist with tasks ranging from website creation to complex negotiations.</p>\n<h2>\nUnderstanding Agents and Their Tools</h2>\n<p>\nAn agent’s effectiveness is determined by two key factors: its environment and its tool inventory. The environment defines the scope of possible actions, while tools enable the agent to perceive and act within this environment. Modern agents leverage three distinct categories of tools.\\ Knowledge augmentation tools, including text retrievers and web browsing capabilities, prevent model staleness by enabling access to current information. However, web browsing tools require careful API selection to protect against unreliable or harmful content. Capability extension tools address inherent model limitations -for instance, providing calculators for precise arithmetic or code interpreters for programming tasks. These interpreters demand robust security measures to prevent code injection attacks.\\ Write actions represent the most powerful and potentially risky category, enabling agents to modify databases or send emails. These tools are distinguished from read-only actions by their ability to affect the environment directly. The <a href=\"https://arxiv.org/abs/2304.09842\">Chameleon</a> system demonstrated the power of tool augmentation, achieving an 11.37% improvement on ScienceQA (a science question answering task) and 17% on TabMWP (a tabular math problem-solving task) through strategic tool combination.</p>\n<center>\n <figure> <a href=\"https://huyenchip.com//2025/01/07/agents.html\"><img src=\"https://huyenchip.com/assets/pics/agents/8-tool-transition.png\" width=\"80%\" height=\"80%\"/></a> <figcaption>A tool transition tree by Chameleon</figcaption> </figure></center>\n<h2>\nPlanning and Execution Strategies</h2>\n<p>\nEffective planning requires balancing granularity and flexibility. While <a href=\"https://arxiv.org/abs/2302.04761\">Toolformer</a> managed with 5 tools and <a href=\"https://arxiv.org/abs/2304.09842\">Chameleon</a> with 13, <a href=\"https://arxiv.org/abs/2305.15334\">Gorilla</a> attempted to handle 1,645 APIs, illustrating the complexity of tool selection. Plans can be expressed either in natural language or specific function calls, each approach offering different advantages in maintainability and precision.\\ Foundational Model planners require minimal training but need careful prompting, while Reinforcement Learning planners demand extensive training for robustness. Modern planning systems support multiple control flows: sequential, parallel, conditional, and iterative patterns. The <a href=\"https://arxiv.org/abs/2210.03629\">ReAct</a> framework successfully combines reasoning with action,</p>\n<center>\n <figure> <a href=\"https://huyenchip.com//2025/01/07/agents.html\"><img src=\"https://huyenchip.com/assets/pics/agents/5-ReAct.png\" width=\"80%\" height=\"80%\"/></a> <figcaption>ReAct agent</figcaption> </figure></center>\n<p>\nwhile <a href=\"https://arxiv.org/abs/2303.11366\">Reflexion</a> separates evaluation and self-reflection for improved performance.</p>\n<center>\n <figure> <a href=\"https://huyenchip.com//2025/01/07/agents.html\"><img src=\"https://huyenchip.com/assets/pics/agents/6-reflexion.png\" width=\"80%\" height=\"80%\"/></a> <figcaption>Reflexion agent</figcaption> </figure></center>\n<h2>\nReflection and Error Management</h2>\n<p>\nContinuous reflection and error correction form the backbone of reliable agent systems. The process begins with query validation, continues through plan assessment, and extends to execution monitoring. Chameleon’s tool transition analysis shows how tools are commonly used together, while Voyager’s skill manager builds on this by tracking and reusing successful tool combinations.</p>\n<h2>\nEvaluation Framework</h2>\n<p>\nAgent evaluation requires a comprehensive approach to failure mode analysis. Planning failures might involve invalid tools or incorrect parameters, while tool-specific failures demand targeted analysis. Efficiency metrics must consider not just step count and costs, but also completion time constraints. When comparing AI and human agents, it’s essential to recognise their different operational patterns -what’s efficient for one may be inefficient for the other. Working with domain experts helps identify missing tools and validate performance metrics.</p>\n<h2>\nConclusion</h2>\n<p>\nHuyen’s analysis demonstrates that successful AI agents emerge from the careful orchestration of three key elements: strategic tool selection, sophisticated planning mechanisms, and robust evaluation frameworks. While tools dramatically enhance agent capabilities -as evidenced by Chameleon’s significant performance improvements- their effectiveness depends on thoughtful curation, balancing between Toolformer’s minimal approach and Gorilla’s extensive API integration. The integration of planning frameworks like ReAct and Reflexion shows how combining reasoning with action and incorporating systematic reflection can enhance agent performance. However, as an emerging field without established theoretical frameworks, significant challenges remain in tool selection, planning efficiency, and error management. Future developments will focus on agent framework evaluation and memory systems for handling information beyond context limits, while maintaining the delicate balance between capability and control that Huyen emphasises throughout her analysis.</p>\n",
"tags": [
"ai",
"llm",
"prompt-engineering",
"system-prompts",
"evaluation",
"best-practices",
"toolchain",
"machine-learning"
]
},
{
"date": "2025-01-10",
"title": "💡 TIL: A Simple Yet Effective Ensemble Technique called Model Soup 🍲",
"url": "/posts/TIL-model-soups.html",
"content": "<p>\n<strong>TL;DR:</strong> Model soups provide a computationally efficient ensemble technique by averaging the weights of similarly trained neural networks, outperforming both individual models and traditional prediction-averaging ensembles while maintaining single-model inference speed.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nWhile most ensemble methods in machine learning combine model predictions, thanks to <a href=\"https://bsky.app/profile/chrisalbon.com/post/3lfbbixka7c25\">Chris Albon</a> I recently learned about an alternative approach called “<em>model soups</em>“ that works directly with model parameters. Instead of aggregating outputs, model soups blend the actual weights and biases of neural networks, showing promising results in computer vision and language tasks.</p>\n<center>\n <a href=\"https://bsky.app/profile/chrisalbon.com/post/3lfbbixka7c25\"><img src=\"https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:umpsiyampiq3bpgce7kigydz/bafkreihvr4b4gid7v6y7karhiusawtqfdbhoen2bt6q55pmugyioj3q3gq@jpeg\" width=\"80%\" height=\"80%\"/></a></center>\n<h2>\nMain Concept</h2>\n<p>\nModel soups are created by averaging the parameters (weights and biases) of multiple independently trained neural networks that share the same architecture and training setup. For example, if we have three models with weights 2.32, <br>\n4.21, and 1.23 for a particular parameter, the “souped” model would use (2.32 + <br>\n4.21 + 1.23) / 3 = 2.587 for that parameter. This process is repeated across all <br>\nparameters in the network. However, not all parameter combinations lead to improvements -models typically need similar training datasets, optimisation methods, and hyperparameters (like learning rate and batch size) to blend effectively. When done right, parameter-averaged models can outperform both individual networks and traditional prediction-averaging ensembles, while maintaining the inference speed of a single model.</p>\n<h2>\nConclusion</h2>\n<p>\nModel soups challenge our intuitions about neural networks by showing that directly averaging weights can produce better results than averaging predictions. While the technique requires careful consideration of training conditions, it provides a computationally efficient way to combine multiple models into a single network, making it particularly valuable for resource-constrained production environments where running multiple models in parallel isn’t feasible.</p>\n",
"tags": [
"neural-network",
"machine-learning",
"performance",
"mlops",
"production",
"evaluation"
]
},
{
"date": "2025-01-09",
"title": "📐 Sparse Autoencoders: A Technical Overview",
"url": "/posts/sparse-autoencoders.html",
"content": "<p>\n<strong>TL;DR:</strong> Sparse autoencoders are neural networks that learn efficient data representations by reconstructing their input while enforcing neuron inactivity constraints, combining reconstruction error, weight decay, and KL-divergence sparsity penalties to automatically extract interpretable features without manual engineering.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nSupervised learning has achieved remarkable successes in areas ranging from computer vision to genomics. However, as Andrew Ng points out in his <a href=\"https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf\">CS294A lecture notes</a>, it faces a fundamental limitation: the need for manually engineered features. While researchers have spent years crafting specialised features for vision, audio, and text processing, this approach neither scales nor generalises well. Sparse autoencoders offer an elegant solution to this challenge by automatically learning features from unlabelled data. These neural networks are distinguished by two key characteristics:</p>\n<ol>\n <li>\nThey attempt to reconstruct their input, forcing them to capture essential </li>\n <li>\nThey employ a sparsity constraint that mimics biological neural systems, <br>\ndata patterns where neurons fire infrequently and selectively </li>\n</ol>\n<p>\nWhile simple implementations may not outperform hand-engineered features in specific domains like computer vision, their strength lies in their generality and biological plausibility. The sparse coding principle has proven effective across diverse domains including audio, text, and visual processing.\\ The mathematical framework combines reconstruction error, regularisation, and sparsity penalties to learn efficient, interpretable representations. This approach not only advances machine learning capabilities but also provides insights into how biological neural networks might learn and process information. This overview examines the mathematical foundations, practical implementation, and emergent properties of sparse autoencoders, following the framework presented in Stanford’s CS294A course notes.</p>\n<h2>\nSparse Autoencoders</h2>\n<p>\nAn autoencoder is a neural network that learns to reconstruct its input. In a sparse autoencoder, we add a critical biological constraint: neurons should be “inactive” most of the time, mimicking how biological neurons exhibit low average firing rates.\\ The basic architecture is:</p>\n<pre><code>Input (x) -> Hidden Layer (sparse activation) -> Output (x̂)</code></pre>\n<p>\nWhere:</p>\n<ul>\n <li>\nInput and output dimensions are equal $(x, \\hat{x} \\in \\R^n)$ </li>\n <li>\nHidden layer learns a sparse representation </li>\n <li>\nNetwork uses sigmoid activation: $f(z) = \\frac{1}{1+e^{-z}}$ </li>\n</ul>\n<h2>\nMathematical Framework</h2>\n<ol>\n <li>\n <p>\n<strong>Base Cost Function</strong> (single training example): </p>\n <p>\n$$ J(W,b; x,y) = \\frac{1}{2}||h_{W,b}(x) - y||^2 $$ </p>\n <p>\nFor a single training example:\\ - Measures reconstruction error between network output $h_{W,b}(x)$ and target $y$\\ - For autoencoders: $y = x$ (we reconstruct the input)\\ - $\\frac{1}{2}$ factor simplifies gradient computations\\ - Squared L2 norm penalises larger reconstruction errors quadratically </p>\n </li>\n <li>\n <p>\n<strong>Full Cost Function with Weight Decay</strong>: </p>\n <p>\nThe cost function $J(W,b)$ combines the average reconstruction error\\ $\\frac{1}{m}\\sum<em>{i=1}^m \\frac{1}{2}||h</em>{W,b}(x^{(i)}) - x^{(i)}||^2$ </p>\n <p>\nwith the weight decay regularisation, to prevent overfitting by penalising large weights:\\ $\\frac{\\lambda}{2}\\sum<em>{l=1}^{n_l-1}\\sum</em>{i=1}^{s<em>l}\\sum</em>{j=1}^{s<em>{l+1}}(W</em>{ji}^{(l)})^2$ </p>\n <p>\n$$ J(W,b) = \\left[\\frac{1}{m}\\sum<em>{i=1}^m \\frac{1}{2}||h</em>{W,b}(x^{(i)}) - y^{(i)}||^2\\right] + \\frac{\\lambda}{2}\\sum<em>{l=1}^{n_l-1}\\sum</em>{i=1}^{s<em>l}\\sum</em>{j=1}^{s<em>{l+1}}(W</em>{ji}^{(l)})^2 $$ </p>\n <p>\nKey points: - For autoencoders, output $y^{(i)}$ equals input $x^{(i)}$ - Weight decay applies only to weights $W$, not biases $b$ - $\\lambda$ balances reconstruction accuracy vs. weight magnitude - The $\\frac{1}{2}$ factor simplifies derivative calculations in backpropagation - This regularisation is distinct from the sparsity constraint (KL divergence term) </p>\n </li>\n <li>\n <p>\n<strong>Sparsity Measurement</strong>: </p>\n <p>\nThe average activation $\\hat{\\rho}_j$ measures how frequently hidden unit $j$ fires across the training set: </p>\n <p>\n$$ \\hat{\\rho}<em>j = \\frac{1}{m}\\sum</em>{i=1}^m[a_j^{(2)}(x^{(i)})] $$ </p>\n <p>\nKey points: - $a_j^{(2)}(x^{(i)})$ is hidden unit $j$’s activation for input $x^{(i)}$ - With sigmoid activation: - Values near 1 mean “active” or “firing” - Values near 0 mean “inactive” - We constrain $\\hat{\\rho}_j \\approx \\rho$ where $\\rho$ is small (typically 0.05) - This enforces selective firing: each neuron responds strongly to specific input patterns </p>\n </li>\n <li>\n <p>\n<strong>Sparsity Penalty</strong> (using <br>\n<a href=\"https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence\">KL divergence</a>): </p>\n <p>\nThe sparsity penalty uses KL divergence to enforce $$\\hat{\\rho}_j \\approx \\rho$$ : </p>\n <p>\n$$ \\sum_{j=1}^{s_2}\\rho\\log\\frac{\\rho}{\\hat{\\rho}_j} + (1-\\rho)\\log\\frac{1-\\rho}{1-\\hat{\\rho}_j} $$ </p>\n <p>\nProperties of this penalty: - Minimised (zero) when $\\hat{\\rho}_j = \\rho$ - Monotonically increases as $\\hat{\\rho}_j$ deviates from $\\rho$ - Becomes infinite as $\\hat{\\rho}_j$ approaches 0 or 1 </p>\n </li>\n <li>\n <p>\n<strong>Final Cost Function</strong>: </p>\n <p>\n$$ J<em>{sparse}(W,b) = J(W,b) + \\beta\\sum</em>{j=1}^{s_2}KL(\\rho||\\hat{\\rho}_j) $$ </p>\n <p>\nComponents: - $J(W,b)$: Standard autoencoder cost (reconstruction error + weight decay) - Sparsity term: KL divergence penalty summed over $s_2$ hidden units </p>\n <p>\n$\\beta$ controls: - Balance between accurate reconstruction and sparse representation - Strength of sparsity enforcement - Higher $\\beta$ → stronger sparsity constraint </p>\n <p>\nThis formulation naturally penalises both over- and under-activation of hidden units relative to target sparsity $\\rho$. </p>\n </li>\n</ol>\n<h2>\nTraining Process</h2>\n<p>\nThe key modification to standard backpropagation occurs in the hidden layer:</p>\n<p>\n$$ \\delta<em>i^{(2)} = \\left(\\sum</em>{j=1}^{s<em>3}W</em>{ji}^{(3)}\\delta_j^{(3)}\\right)f’(s_i^{(2)}) + \\beta\\left(-\\frac{\\rho}{\\hat{\\rho}_i} + \\frac{1-\\rho}{1-\\hat{\\rho}_i}\\right) $$</p>\n<p>\nWhere:</p>\n<ul>\n <li>\nFirst term: Standard backpropagation gradient through the network </li>\n <li>\nSecond term: Gradient of KL-divergence sparsity penalty </li>\n <li>\n$s_i^{(2)}$ is weighted input sum to hidden unit $i$ </li>\n <li>\n$\\hat{\\rho}_i$ must be pre-computed using full training set </li>\n</ul>\n<p>\nThis modification ensures gradient descent optimises both reconstruction accuracy and sparsity.</p>\n<h2>\nPractical Guidelines</h2>\n<ul>\n <li>\n$\\rho$ ≈ 0.05 (5% target activation rate) </li>\n <li>\n$\\beta$ controls sparsity penalty strength </li>\n <li>\nInitialise weights randomly near zero </li>\n <li>\nMust compute forward pass on all examples first to calculate $\\hat{\\rho}$ </li>\n</ul>\n<h2>\nResults</h2>\n<p>\nWhen trained on images, the network naturally learns edge detectors at different orientations, similar to what is found in the visual cortex. This emergence of biologically plausible features validates the sparsity approach.</p>\n<h2>\nConclusion</h2>\n<p>\nSparse autoencoders represent a mathematically principled approach to unsupervised feature learning, combining biological inspiration with rigorous optimisation techniques. Their key innovation lies in the sparsity constraint, implemented through KL divergence, which forces hidden units to develop specialised, interpretable features.</p>\n<p>\nThe mathematical framework achieves this through three key components:</p>\n<ol>\n <li>\nA reconstruction cost that ensures faithful data representation </li>\n <li>\nA weight decay term that prevents overfitting </li>\n <li>\nA sparsity penalty that enforces selective neural activation </li>\n</ol>\n<p>\nThis formulation has proven successful in practice, typically leading to:</p>\n<ul>\n <li>\nEdge and feature detectors emerging naturally from visual data </li>\n <li>\nInterpretable representations comparable to biological neural coding </li>\n <li>\nRobust feature learning even with <br>\n<a href=\"https://en.wikipedia.org/wiki/Overcompleteness\">overcomplete</a> hidden layers </li>\n</ul>\n<p>\nThe practical value of sparse autoencoders extends beyond their theoretical elegance -they provide a foundation for understanding how neural networks can learn meaningful data representations without supervision. Their success in learning biologically plausible features validates both their design principles and their potential for advanced machine learning applications. Their main limitation lies in hyperparameter sensitivity, particularly to the sparsity target ρ and weight β, requiring careful tuning for optimal performance.</p>\n",
"tags": [
"ai",
"llm",
"neural-network",
"machine-learning",
"data-science",
"linear-algebra",
"statistics",
"evaluation",
"interpretability",
"modelling-mindsets",
"design-principles",
"best-practices",
"data-processing"
]
},
{
"date": "2025-01-09",
"title": "🔍 Understanding LLM Interpretability",
"url": "/posts/interpreting-llms.html",
"content": "<p>\n<strong>TL;DR:</strong> LLMs present unique interpretability challenges due to neurons exhibiting polysemanticity- responding to multiple unrelated concepts through superposition- which sparse autoencoders help address by mapping neuron combinations to specific concepts, enhancing our ability to understand, control, and improve these increasingly influential AI systems.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nLarge Language Models (LLMs) have become increasingly sophisticated, yet understanding their inner workings remains a critical challenge for AI safety and development. This blog post summarises concepts and research presented in <a href=\"https://www.youtube.com/watch?v=UGO_Ehywuxc\">Welch Labs’ video on mechanistic interpretability</a>, examining how LLMs process information and recent advances in making their decision-making processes more transparent.</p>\n<h2>\nHow LLMs Think</h2>\n<p>\nLLMs process text through a sophisticated pipeline:</p>\n<ol>\n <li>\nText is converted into tokens and mapped to vectors </li>\n <li>\nThese vectors flow through multiple layers via “<em>residual streams</em>“ </li>\n <li>\nEach layer transforms the information through attention mechanisms </li>\n <li>\nFinal outputs emerge from probability distributions across possible tokens </li>\n</ol>\n<p>\nThis process, while mathematically precise, creates a black box of neural connections that resist simple interpretation.</p>\n<h2>\nThe Challenge of Model Transparency</h2>\n<p>\n<a href=\"https://ai.google.dev/gemma\">Google Gemma</a> models’ analysis of the sentence “<em>the reliability of Wikipedia is very</em>“ demonstrates this complexity. The model assigns varying probabilities to different completions:</p>\n<ul>\n <li>\n“<em>important</em>“ (20.21%) </li>\n <li>\n“<em>high</em>“ (11.16%) </li>\n <li>\n“<em>questionable</em>“ (9.48%) </li>\n</ul>\n<p>\nThese probabilities emerge from intricate interactions between neurons, leading to a phenomenon called <em>superposition</em>[^1].</p>\n<h2>\nSuperposition and Its Solution</h2>\n<p>\nUnlike vision models where neurons correspond to specific concepts, LLMs exhibit <a href=\"https://arxiv.org/abs/2210.01892\">polysemanticity</a> -individual neurons respond to multiple, unrelated concepts. This occurs because LLMs encode more concepts than available neurons by using specific neuron combinations.</p>\n<p>\nThis complexity necessitated the development of <a href=\"{{ site.baseurl }}{% link _posts/2025-01-09-sparse-autoencoders.md %}\">sparse autoencoders</a>, which:</p>\n<ol>\n <li>\nMap complex neuron combinations to specific concepts </li>\n <li>\nExtract interpretable features from LLMs </li>\n <li>\nEnable direct manipulation of model behaviour </li>\n</ol>\n<h2>\nPractical Implications</h2>\n<p>\nUnderstanding LLM internals has crucial implications:</p>\n<ul>\n <li>\n<strong>AI Safety</strong>: Better control over model behaviours and outputs </li>\n <li>\n<strong>Development</strong>: More targeted improvements in model capabilities </li>\n <li>\n<strong>Deployment</strong>: Enhanced ability to predict and prevent unwanted behaviours </li>\n <li>\n<strong>Trust</strong>: Greater transparency in AI decision-making processes </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nWhile tools like sparse autoencoders have provided unprecedented insights into model behaviour, they’ve also revealed the vast complexity of LLM internal mechanisms -the “dark matter” of AI. As these models become more integral to society, advancing our ability to interpret and control them becomes increasingly critical for responsible AI development.\\ This improved understanding represents not just academic progress, but a crucial step toward safer, more reliable AI systems.</p>\n<hr class=\"thin\">\n<p>\n[^1]: superposition in the context of neural networks is the ability of a single</p>\n<pre><code>neuron to represent multiple features simultaneously.\n[https://hdl.handle.net/1721.1/157073](https://hdl.handle.net/1721.1/157073)</code></pre>\n",
"tags": [
"ai",
"llm",
"machine-learning",
"neural-network",
"model-governance",
"interpretability"
]
},
{
"date": "2025-01-08",
"title": "💡 TIL: The Matrix Equation That Makes Linear Regression Work",
"url": "/posts/TIL-lin-alg-applied-to-stats.html",
"content": "<p>\n<strong>TL;DR:</strong> Linear regression can be elegantly solved using the matrix equation β = (X^TX)^(-1)X^Ty, which mathematically guarantees minimum squared error by accounting for feature correlations- though real-world applications often favour gradient descent due to the direct solution’s computational complexity, numerical instability with correlated features, and memory constraints.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThis morning <a href=\"https://xcancel.com/andrew_n_carr/status/1876855682529480844\">an interesting interview question</a> motivated me to remind myself how it’s possible to solve linear regression through matrix algebra. Below is what I learned:</p>\n<h2>\nThe Theory: An Elegant Mathematical Solution</h2>\n<p>\nLinear regression finds the best-fit line through data points by finding optimal coefficients ($\\beta$) that minimise squared errors. The equation $\\beta = (X^TX)^{-1}X^Ty$ elegantly solves this optimisation problem using matrix algebra.</p>\n<p>\nThe solution involves these key components:</p>\n<ol>\n <li>\n$X$ is our feature matrix (n samples × p features) </li>\n <li>\n$y$ is our target values (n × 1) </li>\n <li>\n$X^T$ is the transpose of X </li>\n <li>\n$\\beta$ is our solution vector (p × 1) of coefficients </li>\n</ol>\n<p>\nHere’s how this elegant solution works:</p>\n<ol>\n <li>\n <p>\n$X^TX$ creates a $(p \\times p)$ matrix of feature products: </p>\n <ul>\n <li>\nEach element $(i,j)$ contains the dot product between features $i$ and $j$ - When features are centred, these products are proportional to covariances[^1] - When features are also standardised, it yields correlations scaled by $n$ </li>\n </ul>\n </li>\n <li>\n <p>\n$(X^TX)^{-1}$ computes the inverse of this matrix: </p>\n <ul>\n <li>\nCompensates for feature correlations in coefficient calculations[^2] - Required for solving the normal equations $X^TX\\beta = X^Ty$ - Exists only when no feature is a linear combination of others </li>\n </ul>\n </li>\n <li>\n <p>\n$X^Ty$ creates a $(p \\times 1)$ vector of feature-target products: </p>\n <ul>\n <li>\nEach element $i$ contains the dot product of feature $i$ with target $y$ - Represents raw feature-target relationships before adjustment - When centred, proportional to feature-target covariances[^3] </li>\n </ul>\n </li>\n <li>\n <p>\nFinal multiplication $(X^TX)^{-1}X^Ty$: </p>\n <ul>\n <li>\nSolves the normal equations $X^TX\\beta = X^Ty$ - Accounts for inter-feature correlations in determining coefficients - Mathematically guarantees minimum squared error </li>\n </ul>\n </li>\n</ol>\n<p>\nFor more information, check Hastie, Tibshirani & Friedman’s “<a href=\"https://archive.org/details/elementsofstatis0000hast\">Elements of Statistical Learning</a>“ seminal book.</p>\n<h2>\nThe Real-World Catch</h2>\n<p>\nWhile mathematically elegant, this direct solution has practical limitations in real-world applications:</p>\n<ol>\n <li>\n<em>Computational Complexity</em>: Computing $(X^TX)^{-1}$ requires $\\Omicron(n^3)$ </li>\n <li>\n<em>Numerical Instability</em>: When features are highly correlated (like monthly </li>\n <li>\n<em>Memory Constraints</em>: Large datasets require holding the entire $X^TX$ matrix <br>\noperations, becoming prohibitively expensive for large feature sets. This is why gradient descent, with its $\\Omicron(n^2)$ per-iteration complexity, often proves more practical. and annual income), $X^TX$ becomes nearly singular[^3]. Even small rounding errors in the computation of its inverse can lead to large errors in $\\beta$. In extreme cases, when features are perfectly correlated, the inverse doesn’t exist at all. Gradient descent avoids this matrix inversion entirely. in memory, while gradient descent can work with mini-batches, making it more memory-efficient. </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nWhile this equation brilliantly demonstrates the power of linear algebra in statistics, real-world machine learning often favours gradient descent’s iterative approach. Think of it as choosing between a perfect GPS route through heavy traffic (direct solution) versus taking smaller, adaptable steps through clear side streets (gradient descent). Both reach the same destination, but the practical path often wins in real-world conditions.</p>\n<hr class=\"thin\">\n<p>\n[^1]: When features are centred (mean = 0), each product becomes $n$ times the</p>\n<pre><code>covariance. This means $X^TX$ captures how features vary together, which is\ncrucial because correlated features can lead to unstable coefficients if not\naccounted for. The relationship between $X^TX$ and covariance comes from the\ndefinition of sample covariance:\n$cov(X_i, X_j) = \\frac{1}{n-1}\\sum_{k=1}^n (x_{ki}- \\bar{x_i})(x_{kj}- \\bar{x_j})$.\nWhen data is centred, this simplifies to $\\frac{1}{n-1}(X^TX)_{ij}$.\n$\\frac{X^TX}{n-1}$ returns the sample covariance matrix. This matters\nbecause a) when features are uncentred, $(X^TX)$ gives the sum of products,\nb) when centred $\\frac{X^TX}{n-1}$ gives covariances, c) when also\nstandardised (std = 1), $\\frac{X^TX}{n-1}$ gives correlations.</code></pre>\n<p>\n[^2]: Adjusts coefficient estimates to account for shared information between</p>\n<pre><code>features. For example, if height and weight are correlated, we need to\ndetermine each variable's unique contribution to the prediction, not their\noverlapping effect.</code></pre>\n<p>\n[^3]: When centred, each element becomes $n$ times the covariance between a</p>\n<pre><code>feature and the target. This reveals how each feature individually relates\nto $y$ before accounting for other features' effects, providing a starting\npoint for determining final coefficients.</code></pre>\n<p>\n[^3]: A matrix is singular (or non-invertible) when its determinant is zero. In</p>\n<pre><code>practical terms, this means one or more columns can be expressed as linear\ncombinations of other columns.</code></pre>\n",
"tags": [
"data-science",
"machine-learning",
"statistics",
"ai",
"linear-algebra",
"til",
"modelling-mindsets",
"data-modeling"
]
},
{
"date": "2025-01-08",
"title": "💡 TIL: How Different Societies View and Value Choice",
"url": "/posts/TIL-the-art-of-choice.html",
"content": "<p>\n<strong>TL;DR:</strong> Sheena Iyengar’s cross-cultural research reveals that how we perceive and respond to choice varies dramatically between societies- with evidence showing that sometimes having fewer choices or allowing others to choose for us can lead to better outcomes, challenging the widely-held Western belief that more individual choice is always beneficial.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nToday I revisited a talk on <a href=\"https://www.youtube.com/watch?v=lDq9-QxvsNU\">the art of choosing</a> by Sheena Iyengar. A humourous and informative presentation, it reminded me that our assumptions about choice –as studied by Prof. Iyengar through research spanning American, European and Asian populations– reveals fascinating cultural differences in how we perceive and respond to choice. Her research reveals some eye-opening insights that I’ll briefly summarise below.</p>\n<h2>\nPerceiving Choice</h2>\n<p>\nFirst, while Americans believe individual choice is sacred (think “have it your way”), research shows this isn’t universal. When studying children solving puzzles, Asian-American children actually performed better when their mothers chose for them, while Anglo-American children did better choosing for themselves. This reveals how deeply cultural context shapes not just our preferences, but the actual effectiveness of our choices.</p>\n<p>\nSecond, remember how overwhelming it feels staring at 50 different breakfast cereals? Turns out, people from post-communist countries often saw seven different sodas as just one choice: “soda or no soda.” This isn’t because they’re less sophisticated, it’s because the ability to spot tiny differences between products is a learned skill -not a natural one.</p>\n<p>\nMost striking was the research on medical decisions. When comparing American and French parents making end-of-life decisions for infants, American parents had more negative emotions and guilt despite insisting on having the choice, while French parents, whose doctors made the decisions, coped better. This challenges the core American belief that having choice is always better.</p>\n<p>\nConcluding with a personal story, Prof. Iyengar -who is blind- shared how she once brought two “clearly different” shades of pink nail polish to her lab. When she removed the labels, half the participants couldn’t tell them apart. Those who could, chose differently when the labels were present versus absent, showing how marketing narratives shape what we think we’re choosing.</p>\n<h2>\nConclusion</h2>\n<p>\nThe TL;DR is: Through cross-cultural research, Prof. Iyengar shows that how we understand and value choice varies dramatically across cultures. Sometimes, having fewer choices or letting others choose for us might actually lead to better outcomes.\\ As a technologist, inundated with a very wide choice of tools that often offer similar results, I have made the conscious decision to reduce my tooling footprint to the minimum viable toolstack possible. I’m happy to let more knowledgeable professionals choose, with <em>adequate justification</em>, tools for my line of work but I do disagree with the zealotry that’s occasionally observed in tech and complemented by big egos.</p>\n",
"tags": [
"til",
"decision-making",
"best-practices",
"evaluation",
"statistics",
"design-principles",
"modelling-mindsets"
]
},
{
"date": "2025-01-07",
"title": "💊 Lessons for Modern Drug Development from the Golden Age of Antibiotics",
"url": "/posts/golden-age-of-antibiotics.html",
"content": "<p>\n<strong>TL;DR:</strong> Despite our vastly superior modern technology, antibiotic development has dramatically declined since the post-WWII “Golden Age” (1940s-1960s) that produced most antibiotic classes we still use today-highlighting how scientific capability alone cannot drive progress without three key elements working together: economic incentives that correct market failures, institutional coordination, and systematic application of technological tools.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nIn a thought-provoking analysis[^1], Our World in Data reveals a striking paradox in medical progress: the most productive period in antibiotic development occurred in the two decades following World War II, with scientific capabilities far more limited than today. This “Golden Age of Antibiotics” (1940s-1960s) produced nearly two-thirds of the antibiotic drug classes we still rely on[^2].\\ Even more surprisingly, since 1970 -despite exponential advances in computing power and biotechnology- only eight new classes of antibiotics have been approved[^2]. This indicates a stark decline that threatens the foundation of modern medicine. Traditional screening methods now rediscover existing compounds most of the time rather than finding new ones[^3].\\ Modern tools like genome sequencing and systematic screening methods offer unprecedented capabilities. We’ve only identified a small fraction of bacterial species, many of which could harbour new antibiotic compounds[^2]. Yet despite these capabilities, development has stagnated due to fundamental market failures and fragmented research efforts.\\ This article examines this paradoxical inverse relationship between technological capability and antibiotic development: How did the Golden Age achieve such remarkable success with limited tools? Why has progress slowed as our capabilities have grown? Most importantly, what combinations of economic incentives and modern technology could spark a new era of antibiotic discovery?</p>\n<h2>\nWhen Urgency Met Innovation</h2>\n<p>\nThe Golden Age of Antibiotics stands as medicine’s most productive period in antimicrobial discovery, yielding over 20 new antibiotic classes -more than double what we’ve developed in the 50 years since[^2]. Three pivotal breakthroughs, coupled with unprecedented coordination, drove this remarkable success.\\ The foundation was laid by Paul Ehrlich’s systematic approach to drug discovery. By methodically testing hundreds of compounds, he discovered <a href=\"https://en.wikipedia.org/wiki/Arsphenamine\">salvarsan</a> in 1910 -the first synthetic antibiotic that effectively treated syphilis[^2]. A second milestone emerged when Alexander Fleming discovered penicillin in 1928. However, the real innovation came through coordinated wartime effort. With infections being the second-most common cause of hospital admissions in the US Army, the U.S. Office of Scientific Research and Development (OSRD) launched a global search for more productive penicillin strains, ultimately finding a high-yielding strain on a cantaloupe[^4].\\ The third breakthrough came from Selman Waksman’s insight into soil bacteria. His discovery that soil-dwelling <a href=\"https://en.wikipedia.org/wiki/Actinomycetales\">actinomycetes</a> bacteria naturally produce antibiotics led to <a href=\"https://en.wikipedia.org/wiki/Streptomycin\">streptomycin</a>‘s development and opened an entirely new avenue for antibiotic discovery[^5].\\ What transformed these breakthroughs into a “golden age” was unprecedented coordination. The U.S. War Production Board orchestrated collaboration between government, academia, and industry -removing patent restrictions, sharing data, and streamlining clinical trials[^6]. The results were remarkable: some antibiotics, like <a href=\"https://en.wikipedia.org/wiki/Tetracycline_antibiotics\">tetracyclines</a> and <a href=\"https://en.wikipedia.org/wiki/Macrolide\">macrolides</a>, went from discovery to clinical use within the same year.</p>\n<h2>\nScientific Progress and Market Failure</h2>\n<p>\nThe contrast between the Golden Age and our current era reflects a fundamental misalignment between public health needs and market incentives[^2]. The market structure fundamentally disfavours antibiotics in two ways:</p>\n<ol>\n <li>\nRevenue Structure: While chronic disease medications can generate billions in </li>\n <li>\nConservation Requirements: New antibiotics must be reserved for severe <br>\nannual revenue over decades, new antibiotics typically generate only tens of millions annually[^7], by comparison. This revenue gap has driven many large pharmaceutical companies away from antibiotic development[^8]. drug-resistant infections, reaching less than 1% of hospitalised patients[^7]. This necessary conservation practice severely limits market potential. </li>\n</ol>\n<p>\nMeanwhile, our technological capabilities offer three particularly promising approaches:</p>\n<ol>\n <li>\nGenome mining: a breakthrough technique that identifies hidden antibiotic </li>\n <li>\nAdvanced bacterial exploration: research into extreme environments like deep </li>\n <li>\nSmart combination strategies: exploiting the observation that bacterial <br>\ngenes in microbes that remain dormant under standard laboratory conditions. This computational approach has already yielded promising candidates like humimycins[^9]. oceans and deserts, where previously “unculturable” bacteria might harbour entirely new antibiotic classes[^3]. resistance to one antibiotic can increase vulnerability to others, opening new therapeutic possibilities[^10]. </li>\n</ol>\n<p>\nYet these powerful tools remain underutilised due to insufficient investment and coordination. The challenge isn’t scientific capability -it’s the failure to create systems that effectively deploy these technologies within sustainable economic frameworks.</p>\n<h2>\nIntegrating Economics and Technology</h2>\n<p>\nDrawing from evidence in antibiotic development research, several promising approaches could help overcome current market failures while leveraging modern technological capabilities[^7].</p>\n<h3>\nEconomic Solutions to Market Failures</h3>\n<ol>\n <li>\nSubscription Models: The UK has pioneered a system where healthcare systems </li>\n <li>\nAdvance Market Commitments: These provide guaranteed payments to companies </li>\n <li>\nCollaborative Funding Initiatives: Organisations like CARB-X and GARDP help <br>\npay annual fees for antibiotic access rather than per-volume pricing. This addresses both the revenue challenge and conservation requirements by providing stable income while supporting appropriate antibiotic use[^7]. that successfully develop new antibiotics, similar to successful vaccine development programs. This directly addresses the revenue uncertainty that has driven companies away from antibiotic development[^11]. smaller companies navigate costly clinical trials, distributing development risks that large pharmaceutical companies are unwilling to bear[^7]. </li>\n</ol>\n<h3>\nLeveraging Modern Technology</h3>\n<p>\nTo maximise the impact of these economic incentives, three technological approaches show particular promise:</p>\n<ol>\n <li>\nSystematic Genome Mining: Using computational power to identify promising </li>\n <li>\nEnvironmental Exploration: Research into extreme environments could unlock </li>\n <li>\nSmart Combination Strategies: Systematic exploration of how resistance to one <br>\nantibiotic-producing genes in bacterial genomes, revealing compounds that traditional screening would miss[^9]. entirely new antibiotic classes, enabled by modern sequencing technologies[^3]. antibiotic can increase vulnerability to others, offering new therapeutic possibilities[^10]. </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nThe story of antibiotic development demonstrates that scientific capability alone cannot drive progress. The Golden Age succeeded through a powerful combination of systematic approaches, unprecedented collaboration, and removal of institutional barriers -even with limited technological tools[^2].\\ Today’s challenge is fundamentally different. We possess sophisticated tools -from genome mining to advanced screening methods- yet development has stalled. This paradox reveals that progress requires three key elements working in concert: economic incentives, institutional coordination, and technological application[^7].\\ The evidence-based solutions presented in the original Our World in Data article[^1] offer a path forward. Market reforms like subscription models and advance market commitments could help correct the fundamental economic misalignment in antibiotic development[^8]. Meanwhile, systematic application of computational tools, genomic analysis, and bacterial exploration could help unlock new classes of antibiotics that traditional methods miss[^9].\\ The urgency is clear. Antimicrobial resistance threatens to undermine many advances in modern medicine[^12]. However, by combining proven coordination approaches from the Golden Age with modern capabilities and sustainable economic frameworks, we can revitalise antibiotic development for the challenges ahead.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Our World in Data (2024). “What was the Golden Age of Antibiotics, and how</p>\n<pre><code>can we spark a new one?"\n[https://ourworldindata.org/golden-age-antibiotics](https://ourworldindata.org/golden-age-antibiotics)</code></pre>\n<p>\n[^2]: Hutchings, M. I., Truman, A. W., & Wilkinson, B. (2019). Antibiotics:</p>\n<pre><code>Past, present and future. Current Opinion in Microbiology, 51, 72–80.\n[https://doi.org/10.1016/j.mib.2019.10.008](https://doi.org/10.1016/j.mib.2019.10.008)</code></pre>\n<p>\n[^3]: Kolter, R., & Van Wezel, G. P. (2016). Goodbye to brute force in</p>\n<pre><code>antibiotic discovery? Nature Microbiology, 1(2), 15020.\n[https://doi.org/10.1038/nmicrobiol.2015.20](https://doi.org/10.1038/nmicrobiol.2015.20)</code></pre>\n<p>\n[^4]: Gaynes, R. (2017). The Discovery of Penicillin -New Insights After More</p>\n<pre><code>Than 75 Years of Clinical Use. Emerging Infectious Diseases, 23(5), 849–853.\n[https://doi.org/10.3201/eid2305.161556](https://doi.org/10.3201/eid2305.161556)</code></pre>\n<p>\n[^5]: Waksman, S. A., & Schatz, A. (1945). Streptomycin–Origin, Nature, and</p>\n<pre><code>Properties. Journal of the American Pharmaceutical Association, 34(11),\n273–291.\n[https://doi.org/10.1002/jps.3030341102](https://doi.org/10.1002/jps.3030341102)</code></pre>\n<p>\n[^6]: Sampat, B. N. (2023). Second World War and the Direction of Medical</p>\n<pre><code>Innovation. SSRN Electronic Journal.\n[https://doi.org/10.2139/ssrn.4422261](https://doi.org/10.2139/ssrn.4422261)</code></pre>\n<p>\n[^7]: Årdal, C., et al. (2020). Antibiotic development -economic, regulatory and</p>\n<pre><code>societal challenges. Nature Reviews Microbiology, 18(5), 267-274.\n[https://doi.org/10.1038/s41579-019-0293-3](https://doi.org/10.1038/s41579-019-0293-3)</code></pre>\n<p>\n[^8]: Renwick, M. J., Brogan, D. M., & Mossialos, E. (2016). A systematic review</p>\n<pre><code>and critical assessment of incentive strategies for discovery and\ndevelopment of novel antibiotics. The Journal of Antibiotics, 69(2), 73-88.\n[https://doi.org/10.1038/ja.2015.98](https://doi.org/10.1038/ja.2015.98)</code></pre>\n<p>\n[^9]: Chu, J., et al. (2016). Discovery of MRSA active antibiotics using primary</p>\n<pre><code>sequence from the human microbiome. Nature Chemical Biology, 12(12),\n1004-1006.\n[https://doi.org/10.1038/nchembio.2207](https://doi.org/10.1038/nchembio.2207)</code></pre>\n<p>\n[^10]: Baym, M., Stone, L. K., & Kishony, R. (2016). Multidrug evolutionary</p>\n<pre><code>strategies to reverse antibiotic resistance. Science, 351(6268), aad3292.\n[https://doi.org/10.1126/science.aad3292](https://doi.org/10.1126/science.aad3292)</code></pre>\n<p>\n[^11]: Kremer, M., Levin, J., & Snyder, C. M. (2020). Advance Market</p>\n<pre><code>Commitments: Insights from Theory and Experience. AEA Papers and\nProceedings, 110, 269-273.\n[https://www.aeaweb.org/articles?id=10.1257/pandp.20201017](https://www.aeaweb.org/articles?id=10.1257/pandp.20201017)</code></pre>\n<p>\n[^12]: World Health Organization (2024). Antimicrobial Resistance Fact Sheet.</p>\n<pre><code>[https://www.who.int/news-room/fact-sheets/detail/antimicrobial-resistance](https://www.who.int/news-room/fact-sheets/detail/antimicrobial-resistance)</code></pre>\n",
"tags": [
"iterative-refinement",
"evolution",
"data-science",
"evaluation",
"decision-making",
"best-practices",
"modelling-mindsets",
"production"
]
},
{
"date": "2025-01-02",
"title": "💡 TIL: Test-Driven Development Is Key to Better LLM System Prompts",
"url": "/posts/TIL-tdd-good-system-prompts.html",
"content": "<p>\n<strong>TL;DR:</strong> Anthropic’s approach to system prompt development parallels test-driven development-first creating test cases where default model behaviour fails, then developing prompts that pass these tests, followed by iterative refinement-highlighting how robust automated evaluation is not merely a quality check but the foundation for building reliable LLM applications.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\n2024 has made clear that writing good automated evaluations for LLM-powered systems is the most critical skill for building useful applications. This insight parallels Anthropic’s internal approach to system prompt development. As usual, Simon Willison’s <a href=\"https://simonwillison.net/2024/Dec/31/llms-in-2024/#evals-really-matter\">recent insightful 2024 LLM overview</a> was a treasure trove. One item I picked up on was evaluating system prompts using a test-driven approach.</p>\n<h2>\nThe Evaluation-First Approach</h2>\n<p>\n<a href=\"https://askell.io/\">Amanda Askell</a>, leading fine-tuning at Anthropic, <a href=\"https://xcancel.com/amandaaskell/status/1866207266761760812\">outlines a test-driven process</a> for system prompts:</p>\n<ol>\n <li>\nCreate a test set of messages where the model’s default behaviour fails to </li>\n <li>\nDevelop a system prompt that passes these tests </li>\n <li>\nIdentify cases where the system prompt is misapplied and refine it </li>\n <li>\nExpand the test set and repeat <br>\nmeet requirements </li>\n</ol>\n<p>\nThis methodology’s importance extends beyond prompt engineering. Companies with strong evaluation suites can adopt new models faster and build more reliable features than competitors. As <a href=\"https://xcancel.com/cramforce/status/1860436022347075667\">Vercel’s experience demonstrates</a>, moving from complex prompt protection to robust testing enables rapid iteration and development.</p>\n<h2>\nConclusion</h2>\n<p>\nWhile everyone acknowledges evals’ importance, implementing them effectively remains challenging. The key insight is clear: robust automated evaluation isn’t just a quality check, it’s the foundation for building reliable LLM-powered systems.</p>\n",
"tags": [
"ai",
"llm",
"til",
"prompt-engineering",
"testing",
"best-practices",
"evaluation",
"machine-learning",
"system-prompts"
]
},
{
"date": "2024-12-24",
"title": "📝 From Vim to VSCode to Neovim",
"url": "/posts/vscode-to-neovim.html",
"content": "<p>\n<strong>TL;DR:</strong> For me, Neovim strikes the perfect balance between Vim’s simplicity and VSCode’s features. After wrestling with VSCode’s keyboard input failures on Fedora and its resource demands, I found that Neovim’s single configuration file, robust plugins, and cross-platform reliability make it ideal for my Python, Deno, and Clojure development needs. Sometimes stepping back to move forward is exactly what we need!</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nVim’s portable <code class=\"inline\">.vimrc</code> embodies software minimalism at its best. One file, one minute to setup, resulting in a complete development environment. This simplicity served me well until Azure development motivated the use of VSCode.\\ While VSCode worked reasonably well on macOS, Fedora revealed its constraints: keyboard input failures, heavy resource usage, and <a href=\"https://stackoverflow.com/questions/35368889/how-can-i-export-settings\">complex environment portability</a> compared to Vim’s <code class=\"inline\">vim +PlugInstall</code>. These limitations drove my search for tools that could maintain simplicity while meeting my development requirements with simplicity and portability in mind.</p>\n<h2>\nVim -> VSCode -> Neovim</h2>\n<p>\nAzure development initially pulled me into VSCode’s ecosystem. While stable on macOS, Fedora revealed deal-breakers: random keyboard input failures that only responded to command palette (Ctrl+Shift+P). No amount of configuration resets or reinstalls resolved these issues.</p>\n<p>\nThis instability, coupled with VSCode’s resource footprint, led me to Neovim. The timing aligned with my exploration of Clojure, where Neovim’s Conjure plugin offered a compelling Lisp development experience that rivaled Emacs.</p>\n<p>\nMy requirements were specific:</p>\n<ul>\n <li>\nA lightweight Python IDE </li>\n <li>\nA lightweight Deno IDE </li>\n <li>\nA lightweight Clojure IDE </li>\n</ul>\n<p>\nThrough <a href=\"{{ site.baseurl }}{% link _posts/2024-11-15-dialogue-engineering.md %}\">Dialogue Engineering</a>, I crafted a complete IDE using a <a href=\"https://github.com/ai-mindset/init.vim\">single configuration file</a>. Neovim’s mixed ecosystem of package managers and dual Vimscript/Lua support presents a learning curve, but the resulting environment is fast, stable, and precisely tailored to my needs. One minor drawback is the complexity of adding colour to Conjure’s output, especially when compared to the rich REPL experiences offered by <a href=\"https://ipython.org/\">IPython</a>, <a href=\"https://deno.com/\">Deno</a>, and Clojure with <a href=\"https://github.com/bhauman/rebel-readline\">rebel-readline</a>.</p>\n<h2>\nConclusion</h2>\n<p>\nThe journey from Vim to VSCode and finally to Neovim reflects a common pattern in software development: sometimes we need to step backward to move forward. While VSCode offered modern IDE features, its stability and resource issues on Linux highlighted the enduring value of minimal, portable tools.\\ Neovim strikes an elegant balance: it preserves Vim’s philosophy of simplicity and portability while providing modern IDE capabilities. Despite minor challenges with REPL colourisation, its single configuration file approach and robust plugin ecosystem make it a powerful choice for polyglot development. For developers who value both minimal tooling and modern features, Neovim proves that we don’t always have to choose between the two.</p>\n",
"tags": [
"minimal",
"cross-platform",
"toolchain",
"best-practices",
"design-principles",
"python",
"deno",
"zero-config"
]
},
{
"date": "2024-12-23",
"title": "💡 TIL: Exploring OpenAI's API with Swagger",
"url": "/posts/TIL-openai-openapi.html",
"content": "<p>\n<strong>TL;DR:</strong> You can easily explore OpenAI’s complete API documentation by loading their GitHub-hosted OpenAPI YAML file directly into Swagger’s web interface. This approach lets you interactively examine all endpoints, request/response schemas, and test functionality—a valuable reference for anyone building services that need to maintain compatibility with OpenAI’s API structure.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nOpenAI maintains a comprehensive <a href=\"https://github.com/openai/openai-openapi/\">OpenAPI specification</a> that documents their entire API surface. While browsing through their GitHub repository, <a href=\"https://simonwillison.net/\">Simon Willison</a>[^1] discovered you can easily explore this spec using Swagger’s web interface.</p>\n<h2>\nThe Discovery</h2>\n<p>\nWillison recently highlighted a neat trick: you can browse OpenAI’s full API documentation by loading their <a href=\"https://github.com/openai/openai-openapi/blob/master/openapi.yaml\">OpenAPI YAML file</a> directly into <a href=\"https://petstore.swagger.io/?url=https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/master/openapi.yaml#/\">Swagger’s web UI</a>.</p>\n<h2>\nWhy This Matters</h2>\n<p>\nThis approach offers several advantages:</p>\n<ul>\n <li>\nInteractive exploration of all API endpoints </li>\n <li>\nComplete request/response schemas </li>\n <li>\nBuilt-in testing capability </li>\n <li>\nDetailed parameter documentation </li>\n</ul>\n<p>\nFor developers working with AI APIs, this provides a valuable reference point- especially when building services that need to maintain compatibility with OpenAI’s API structure.</p>\n<h2>\nTry It Yourself</h2>\n<p>\nVisit the <a href=\"https://petstore.swagger.io/\">Swagger UI</a> and paste this URL:\\ <code class=\"inline\">https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/master/openapi.yaml</code></p>\n<hr class=\"thin\">\n<p>\n[^1]: Co-founder of</p>\n<pre><code>[Lanyrd](https://blog.natbat.net/post/61658401806/lanyrd-from-idea-to-exit),\nco-creator of [Django](https://simonwillison.net/2005/Jul/17/django/) and\n[Datasette](https://datasette.io/) and a prolific independent AI researcher</code></pre>\n",
"tags": [
"ai",
"llm",
"openai",
"openapi",
"spec"
]
},
{
"date": "2024-12-19",
"title": "🎛️ A Practical Guide to Fine-tuning LLMs with InstructLab",
"url": "/posts/instructlab-and-rag.html",
"content": "<p>\n<strong>TL;DR:</strong> InstructLab democratises LLM fine-tuning through its structured LAB methodology, offering three hardware-adaptive QLoRA-based training pipelines (Simple, Full, and Accelerated) that enable organisations to create domain-specific models without massive computing resources whilst maintaining comprehensive evaluation frameworks.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThe explosion of Large Language Models (LLMs) has created a pressing need for domain-specific adaptations. While base models like GPT-4, Claude, and Llama demonstrate impressive general capabilities, organisations often need models that excel in specific domains or exhibit particular behavioural traits. This customisation typically requires fine-tuning, a process that has historically demanded significant expertise, computational resources, and sophisticated infrastructure.</p>\n<h3>\nThe Fine-tuning Challenge</h3>\n<p>\nTraditional LLM fine-tuning presents a complex web of interconnected challenges that organisations must navigate. At its core lies the need for sophisticated infrastructure, often requiring specialised hardware and carefully orchestrated software stacks. This infrastructure challenge is compounded by substantial computational costs, making experimentation and iteration expensive.\\ The data challenge is equally significant. Fine-tuning demands large, high-quality datasets that are both rare and expensive to create. Even when such datasets exist, organisations face the risk of catastrophic forgetting, where models lose their general capabilities while acquiring new ones. Moreover, validating improvements remains a complex task, requiring careful benchmarking and evaluation frameworks.\\ These challenges have historically restricted fine-tuning to well-resourced organisations, creating a significant barrier to entry for smaller teams and organisations seeking to adapt LLMs to their specific needs.</p>\n<h3>\nReal-world Challenges</h3>\n<p>\nThe adaptation of LLMs to specific domains presents organisations with a multifaceted set of practical challenges. In healthcare, medical institutions grapple with the need for models that can accurately process and generate content using complex medical terminology while maintaining strict clinical protocols. This domain expertise challenge extends beyond mere vocabulary; it encompasses understanding of medical procedures, drug interactions, and diagnostic reasoning.\\ The financial sector faces equally demanding requirements, particularly around compliance and regulation. Banks and financial institutions must ensure their models operate within specific regulatory frameworks, making decisions that are not only accurate but also auditable and explainable to regulatory bodies.\\ Data quality emerges as a persistent challenge across sectors. Organisations typically struggle with historical datasets that exhibit inconsistent formatting, missing values, and inherent biases. The challenge extends to maintaining proper version control and data lineage tracking, crucial for both compliance and model improvement cycles.\\ Regulatory constraints add another layer of complexity. Healthcare organisations must ensure strict HIPAA compliance in their model development and deployment processes. Similarly, any organisation handling European data must adhere to GDPR requirements, while specific industries often face additional certification needs. These regulatory requirements must be considered not just in the final deployment but throughout the entire fine-tuning process.</p>\n<h3>\nThe Role of InstructLab</h3>\n<p>\n<a href=\"https://instructlab.ai/\">InstructLab</a> emerges as a systematic solution to these challenges, offering a novel approach to LLM fine-tuning that combines:</p>\n<ul>\n <li>\nSynthetic data generation for high-quality training examples </li>\n <li>\nEfficient <a href=\"https://arxiv.org/abs/2305.14314\">QLoRA</a>-based training pipelines </li>\n <li>\nComprehensive evaluation frameworks </li>\n <li>\nHardware-adaptive processing </li>\n</ul>\n<p>\nThe rest of this article will elaborate on <a href=\"https://instructlab.ai/\">InstructLab</a>‘s architecture, workflow, and practical considerations, demonstrating how it makes LLM fine-tuning accessible while maintaining rigorous quality standards. It will explore how organisations can leverage this tool to enhance their AI capabilities efficiently and systematically.</p>\n<h2>\nFrom Principles to Practice</h2>\n<p>\n<a href=\"https://instructlab.ai/\">InstructLab</a> is built around the LAB (Large-Scale Alignment for ChatBots) methodology, leveraging [QLoRA(<a href=\"https://arxiv.org/abs/2305.14314\">https://arxiv.org/abs/2305.14314</a>) (Quantized Low-Rank Adaptation) for efficient fine-tuning. The system requires Python 3.10/3.11 and approximately 500GB of disc space for full operation.</p>\n<h3>\nArchitectural Components</h3>\n<p>\nThe system operates through three primary components:</p>\n<ul>\n <li>\n<strong>Taxonomy Repository</strong>: A structured collection of knowledge and skills, </li>\n <li>\n<strong>Synthetic Data Generator</strong>: Uses a teacher model (default: Mixtral/Mistral </li>\n <li>\n<strong>Training Pipeline System</strong>: <a href=\"https://arxiv.org/abs/2305.14314\">QLoRA</a>-based <br>\norganised in YAML files (max 2300 words per Q&A pair) instruct for full pipeline, Merlinite 7b for simple) to transform taxonomy entries into diverse training examples training options optimised for different hardware configurations </li>\n</ul>\n<h3>\nTraining Pipelines</h3>\n<p>\n<a href=\"https://instructlab.ai/\">InstructLab</a> offers three specialised training pipelines:</p>\n<ol>\n <li>\n <p>\n<strong>Simple Pipeline</strong> </p>\n <ul>\n <li>\nFast training (~1 hour) - Uses SFT Trainer (Linux) or MLX (MacOS) - Great for initial experiments and validation </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Full Pipeline</strong> </p>\n <ul>\n <li>\nCustom <a href=\"https://arxiv.org/abs/2305.14314\">QLoRA</a> training loop optimised for CPU/MPS - Enhanced data processing functions - Memory requirement: 32GB RAM - Balanced performance and accessibility </li>\n </ul>\n </li>\n <li>\n <p>\n<strong>Accelerated Pipeline</strong> </p>\n <ul>\n <li>\nGPU-accelerated distributed <a href=\"https://arxiv.org/abs/2305.14314\">QLoRA</a> training - Supports NVIDIA CUDA and AMD ROCm - Requires 18GB+ GPU memory - Ideal for production-grade fine-tuning </li>\n </ul>\n </li>\n</ol>\n<h3>\nHardware Support and Quantisation</h3>\n<p>\n<a href=\"https://instructlab.ai/\">InstructLab</a> supports various hardware configurations with automatic quantisation:</p>\n<ul>\n <li>\nApple M-series chips: MLX optimisation, MPS acceleration </li>\n <li>\nNVIDIA GPUs: CUDA support, 4-bit quantisation available </li>\n <li>\nAMD GPUs: ROCm support, similar quantisation options </li>\n <li>\nStandard CPUs: Optimised quantisation for memory efficiency </li>\n</ul>\n<h2>\nPractical Workflow</h2>\n<p>\nWith the architectural foundation established, <a href=\"https://instructlab.ai/\">InstructLab</a> provides a systematic approach to implementing these components through a straightforward command-line interface. The following sections detail the practical steps to leverage this architecture effectively.</p>\n<h3>\nSetup and Installation</h3>\n<pre><code class=\"bash\">pip install instructlab\nilab config init</code></pre>\n<p>\nKey requirements:</p>\n<ul>\n <li>\nPython 3.10 or 3.11 (>=3.12 not supported[^1]) </li>\n <li>\n500GB recommended disc space </li>\n <li>\n16GB RAM minimum, 32GB recommended </li>\n</ul>\n<h3>\nCore Workflow Steps</h3>\n<ol>\n <li>\n <p>\n<strong>Model Acquisition</strong> <br>\n<code class=\"inline\">bash ilab model download</code> - Downloads pre-trained base models - Supports GGUF (4-bit to 16-bit) and Safetensors formats - Automatic quantisation with configurable parameters </p>\n </li>\n <li>\n <p>\n<strong>Synthetic Data Generation</strong> <br>\n<code class=\"inline\">bash ilab model serve ilab data generate --pipeline [simple|full]</code> Common issues and solutions: - Server conflicts: Use different ports with <code class=\"inline\">--port</code> - Memory errors: Reduce batch size or use <code class=\"inline\">--pipeline simple</code> - Teacher model issues: Verify model checksum and try re-downloading </p>\n </li>\n <li>\n <p>\n<strong>Training</strong> <br>\n<code class=\"inline\">bash ilab model train</code> Hyperparameters (configurable in config.yaml): - Max epochs: 10 </p>\n </li>\n <li>\n <p>\n<strong>Evaluation</strong> <br>\n<code class=\"inline\">bash ilab model evaluate</code> Benchmarks and typical scores: - <a href=\"http://en.wikipedia.org/wiki/MMLU\">MMLU</a>: Knowledge (0.0-1.0 scale) - MMLUBranch: Delta improvements - MTBench: Skills (0.0-10.0 scale) - MTBenchBranch: Skill improvements </p>\n </li>\n</ol>\n<h3>\nModel Deployment</h3>\n<pre><code class=\"bash\">ilab model serve --model-path <new-model-path>\nilab model chat -m <new-model-path> # Optionally, chat with the model</code></pre>\n<p>\nDeployment considerations:</p>\n<ul>\n <li>\nVerify quantisation level matches hardware capabilities </li>\n <li>\nMonitor memory usage during serving </li>\n <li>\nConsider temperature settings for inference (default: 1.0) </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\n<a href=\"https://instructlab.ai/\">InstructLab</a> represents a significant advancement in democratising LLM fine-tuning, bridging the gap between research capabilities and practical deployment. Through its innovative LAB methodology and <a href=\"https://arxiv.org/abs/2305.14314\">QLoRA</a>-based implementation, it makes sophisticated model adaptation accessible to practitioners across different hardware configurations.</p>\n<h3>\nKey Advantages</h3>\n<ul>\n <li>\n<strong>Accessibility</strong>: From laptops to data centres, </li>\n <li>\n<strong>Flexibility</strong>: Multiple training pipelines accommodate different needs and </li>\n <li>\n<strong>Systematic</strong>: Structured approach to knowledge and skill injection through </li>\n <li>\n<strong>Verifiable</strong>: Comprehensive evaluation suite ensures quality of fine-tuned <br>\n<a href=\"https://instructlab.ai/\">InstructLab</a> scales with available resources constraints taxonomy models </li>\n</ul>\n<h3>\nPractical Impact</h3>\n<p>\n<a href=\"https://instructlab.ai/\">InstructLab</a> enables organisations to:</p>\n<ul>\n <li>\nCreate domain-specialised models without massive compute resources </li>\n <li>\nSystematically inject new capabilities through structured knowledge </li>\n <li>\nValidate improvements through quantitative benchmarks </li>\n <li>\nDeploy fine-tuned models with minimal operational overhead <br>\nrepresentation </li>\n</ul>\n<h3>\nLimitations and Considerations</h3>\n<ul>\n <li>\n <p>\n<strong>Model Constraints</strong>: Currently supports models up to 7B parameters </p>\n </li>\n <li>\n <p>\n<strong>Resource Timeline</strong>: Typical deployment cycle from setup to production: <br>\neffectively - Initial setup: a few hours - Synthetic Data generation: 15 minutes to 1+ hours depending on computing resources - Training: several hours on consumer hardware - Evaluation and deployment: a few hours </p>\n </li>\n <li>\n <p>\n<strong>Maintenance Requirements</strong>: </p>\n <ul>\n <li>\nRegular model evaluations against new benchmarks - Periodic retraining with updated taxonomy - System updates and dependency management - Storage management for checkpoints and datasets </li>\n </ul>\n </li>\n</ul>\n<h3>\nRAG vs Fine-tuning</h3>\n<p>\nIt’s important to recognise that fine-tuning isn’t always the optimal solution. For dynamic, frequently changing knowledge bases, Retrieval-Augmented Generation (RAG) often provides a more practical and maintainable solution. Fine-tuning through <a href=\"https://instructlab.ai/\">InstructLab</a> is most valuable for:</p>\n<ul>\n <li>\nStable knowledge domains (e.g., natural sciences, engineering) </li>\n <li>\nConsistent skill enhancement needs </li>\n <li>\nCases where inference latency is critical </li>\n</ul>\n<p>\nThe system’s architecture strikes a careful balance between computational efficiency and training effectiveness, making it a practical tool for both experimentation and production use. While not eliminating the complexity of LLM fine-tuning entirely, <a href=\"https://instructlab.ai/\">InstructLab</a> significantly reduces the technical barriers to entry in this crucial domain.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Python version compatibility remains a significant consideration in the ML</p>\n<pre><code>ecosystem. While newer versions (≥3.12) offer improved performance, they\noften lack compatibility with essential ML frameworks. This constraint\ninforms [InstructLab](https://instructlab.ai/)'s current version\nrequirements.</code></pre>\n",
"tags": [
"ai",
"llm",
"model-governance",
"production",
"quantisation",
"python",
"mlops",
"best-practices",
"data-science"
]
},
{
"date": "2024-12-07",
"title": "💡 TIL: Understanding GGUF Model Quantisation",
"url": "/posts/TIL-llm-quantisation.html",
"content": "<p>\n<strong>TL;DR:</strong> GGUF quantisation converts LLM weights from 16-bit to lower precision formats (2-bit to 6-bit) to run large models on consumer hardware. Each format offers different tradeoffs between size, speed, and quality, with Q4_K_S (4-bit) representing the sweet spot for most users—providing 3.7x size reduction while maintaining good quality. Mixed precision strategies (_S/_M/_L variants) further optimize performance by targeting attention and feed-forward layers with higher precision bits.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nWhen experimenting with larger language models (12B, 30B, 70B etc.), choosing the right quantisation format becomes crucial for striking a good balance i.e. running them on consumer hardware while maintaining reasonably good performance. I wrote this guide after spending time looking up different GGUF quantisation types to optimise model selection for my machine’s constraints. This guide explains quantisation methods and their practical tradeoffs to help the reader select the optimal format for their setup.\\ The quantisation formats discussed here are implemented in popular frameworks like <a href=\"https://github.com/ggerganov/llama.cpp\">llama.cpp</a>. Q4_K_S is typically the default format due to its good balance of size, speed, and quality, while Q2_K and Q3_K variants are offered for more constrained systems.</p>\n<h2>\nWhat is Quantisation?</h2>\n<p>\nQuantisation converts model weights from 16-bit floating point (F16) to lower precision formats using fixed-size blocks. Each block contains multiple weights that share scaling parameters.\\ Perplexity is the key metric used to measure model quality after quantisation. It indicates how well the model predicts text, the lower the perplexity the better the predictions. For example, a change from 5.91 to 6.78 perplexity represents a noticeable but often acceptable drop in prediction quality. A model with perplexity 6.78 is slightly less certain about its predictions than one with perplexity 5.91.</p>\n<h2>\nBasic Quantisation Types and K-Quantisation</h2>\n<p>\nK-quantisation is a way to make AI models smaller using two methods to store weights (the model’s numbers):</p>\n<ol>\n <li>\nType-0 (simpler): reconstructs weight as <code class=\"inline\">weight = scale × quant</code> </li>\n <li>\nType-1 (more precise): reconstructs weight as <br>\n<code class=\"inline\">weight = scale × quant + minimum</code> </li>\n</ol>\n<p>\nThe “block minimum” <code class=\"inline\">minimum</code> is the smallest value found in a group of weights. By tracking this minimum, we can represent the other values more precisely relative to it, rather than having to represent their full absolute values.</p>\n<p>\nEach format groups weights into “super-blocks” to save space. Specifically:</p>\n<p>\nQ2_K (2-bit):</p>\n<ul>\n <li>\nUses Type-1 formula </li>\n <li>\nOrganises weights in groups of 256 (16 blocks × 16 weights) </li>\n <li>\nUses 4 bits to store both scales and minimums </li>\n <li>\nTakes exactly 2.5625 bits per weight </li>\n <li>\nResult: Shrinks a 13GB model to 2.67GB, but quality drops (perplexity <br>\nincreases from 5.91 to 6.78) </li>\n</ul>\n<p>\nQ3_K (3-bit):</p>\n<ul>\n <li>\nUses Type-0 formula (simpler one) </li>\n <li>\nSame organisation: 16 blocks × 16 weights </li>\n <li>\nUses 6 bits to store scales </li>\n <li>\nTakes exactly 3.4375 bits per weight </li>\n <li>\nBetter quality than Q2_K but bigger file size </li>\n</ul>\n<p>\nQ4_K (4-bit):</p>\n<ul>\n <li>\nUses Type-1 formula </li>\n <li>\nDifferent organisation: 8 blocks × 32 weights = 256 total </li>\n <li>\nUses 6 bits for both scales and minimums </li>\n <li>\nTakes exactly 4.5 bits per weight </li>\n <li>\nMuch better quality, file size around 3.56GB </li>\n</ul>\n<p>\nQ5_K (5-bit):</p>\n<ul>\n <li>\nUses Type-1 formula </li>\n <li>\nSame organisation as Q4_K </li>\n <li>\nAlso uses 6 bits for scales and minimums </li>\n <li>\nTakes exactly 5.5 bits per weight </li>\n <li>\nQuality getting very close to original </li>\n</ul>\n<p>\nQ6_K (6-bit):</p>\n<ul>\n <li>\nUses Type-0 formula </li>\n <li>\nBack to 16 blocks × 16 weights </li>\n <li>\nUses 8 bits for scales </li>\n <li>\nTakes exactly 6.5625 bits per weight </li>\n <li>\nAlmost perfect quality, file size 5.15GB </li>\n</ul>\n<p>\nThe main tradeoff: Fewer bits means smaller files but lower quality. More bits means better quality but larger files. This lets users choose what works best for their needs.\\ When compressing numbers in Type-1 quantisation, each block keeps track of its smallest value (the minimum). When reconstructing the weights, this minimum is added back after multiplication. This helps preserve the range of values more accurately than just using scaling alone.</p>\n<p>\nA simple way to think of this concept is:</p>\n<ul>\n <li>\nType-0 just stretches/shrinks values using a scale </li>\n <li>\nType-1 first shifts all numbers by subtracting the minimum (making them <br>\nsmaller), then scales them for storage, and when reconstructing adds the minimum back </li>\n</ul>\n<p>\nThis is why Type-1 generally gives better quality results but needs more storage space. It has to keep track of both the scale and minimum for each block.</p>\n<h2>\nMixed Precision Strategies</h2>\n<p>\nK-quantisations use different precision levels for different model components. From <a href=\"https://github.com/ggerganov/llama.cpp\">llama.cpp</a> documentation, there are three variants:</p>\n<ul>\n <li>\n <p>\nS (Small): Uses single quantisation throughout Example using Q3_K_S: </p>\n <blockquote>\n <p>\nAll model tensors → Q3_K (3-bit)\\ > Result: 2.75GB size, 6.46 perplexity (7B model) </p>\n </blockquote>\n </li>\n <li>\n <p>\nM (Medium): Strategic mixed precision Example using Q3_K_M: </p>\n <blockquote>\n <p>\nattention.wv[^1], attention.wo[^2], feed_forward.w2[^3] → Q4_K (4-bit)\\ > All other tensors → Q3_K (3-bit)\\ > Result: 3.06GB size, 6.15 perplexity (7B model) </p>\n </blockquote>\n </li>\n <li>\n <p>\nL (Large): Higher precision mix Example using Q3_K_L: </p>\n <blockquote>\n <p>\nattention.wv[^1], attention.wo[^2], feed_forward.w2[^3] → Q5_K (5-bit)\\ > All other tensors → Q3_K (3-bit)\\ > Result: 3.35GB size, 6.09 perplexity (7B model) </p>\n </blockquote>\n </li>\n</ul>\n<p>\nThese strategies target attention and feed-forward layers with higher precision because they directly impact text processing quality, as demonstrated by the perplexity improvements in benchmarks: Q3_K_S (6.46) → Q3_K_M (6.15) → Q3_K_L (6.09).\\ The improvement in perplexity scores demonstrates why mixed precision strategies are effective, though they require more storage space.</p>\n<h2>\nPerformance Comparison (7B model)</h2>\n<pre><code>Format | Size(GB) | Reduction | BPW | Perplexity | RTX4080 | M2Max\nF16 | 13.0 | 1.0x | 16.0 | 5.91 | 60.0ms | 116ms\nQ2_K | 2.67 | 4.9x | 2.56 | 6.78 | 15.5ms | 56ms\nQ3_K_S | 2.75 | 4.7x | 3.44 | 6.46 | 18.6ms | 81ms\nQ4_K_S | 3.56 | 3.7x | 4.50 | 6.02 | 15.5ms | 50ms\nQ6_K | 5.15 | 2.5x | 6.56 | 5.91 | 18.3ms | 75ms</code></pre>\n<p>\n*BPW = Bits Per Weight, Speed in milliseconds per token</p>\n<p>\nPractical Recommendations:</p>\n<ul>\n <li>\nBalanced Performance: Q4_K_S </li>\n <li>\nMaximum Compression: Q2_K </li>\n <li>\nBest Quality: Q6_K (matches F16) </li>\n <li>\nLimited RAM: Q2_K or Q3_K </li>\n <li>\nGPU Inference: Q4_K (optimal speed/quality) </li>\n</ul>\n<p>\nAll data are from recent <a href=\"https://github.com/ggerganov/llama.cpp/pull/1684\">llama.cpp</a> performance benchmarks and <a href=\"https://github.com/ggerganov/ggml\">GGML</a> implementation details.</p>\n<h2>\nMemory Requirements for Inference</h2>\n<p>\nWhen running quantised models, more RAM is required than the model size alone for inference overhead. Memory requirements depend on several factors:</p>\n<ul>\n <li>\nModel architecture and size </li>\n <li>\nBatch size for inference </li>\n <li>\nNumber of layers loaded at once </li>\n <li>\nOperating system and framework overhead </li>\n</ul>\n<p>\nFor 7B models (verified from benchmarks):</p>\n<pre><code>Format | Model Size | Note\nF16 | 13.0GB | Base format\nQ4_K_S | 3.56GB | Common choice\nQ3_K_S | 2.75GB | Minimum size\nQ6_K | 5.15GB | Highest quality</code></pre>\n<p>\nFor larger models scale the memory requirements proportionally and ensure additional overhead memory is available for inference. Test with smaller models first to gauge the system’s capabilities.\\ Actual RAM/VRAM requirements will be higher than the model size. Consider monitoring memory usage during inference to determine exact requirements for a specific setup.\\ Here is an example memory usage scenario for a Q4_K_S 7B model:</p>\n<ul>\n <li>\nModel size: 3.56GB </li>\n <li>\nInference overhead: ~2GB for standard settings </li>\n <li>\nOperating system buffer: ~1GB recommended </li>\n <li>\nTotal recommended free memory: ~7GB </li>\n</ul>\n<p>\nThis explains why a model that’s “3.56GB” might need 6-7GB of free RAM/VRAM to run smoothly. The exact overhead varies based on your settings and system.</p>\n<h2>\nConclusion</h2>\n<p>\nModern quantisation techniques offer multiple ways to run large language models on consumer hardware. Here’s what we need to remember:</p>\n<ul>\n <li>\nK-quantisation provides the best balance through super-blocks and mixed </li>\n <li>\nQ4_K_S (4-bit) represents the current sweet spot for most users, offering: </li>\n <li>\nFor more constrained setups, Q2_K/Q3_K variants can run larger models with </li>\n <li>\nHigher bits (Q5_K, Q6_K) approach F16 quality but require more memory </li>\n <li>\nThe _S/_M/_L variants let the user fine-tune the quality-size tradeoff by <br>\nprecision strategies - 3.7x size reduction - Good perplexity (6.02) - Excellent inference speed on both GPU and CPU acceptable quality loss adjusting precision where it matters most </li>\n</ul>\n<p>\nBefore downloading a quantised model, check the system’s available RAM and choose a format that leaves enough memory for comfortable operation. For most users with modern GPUs, Q4_K variants will provide the best experience.</p>\n<hr class=\"thin\">\n<p>\n[^1]: In</p>\n<pre><code>[llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master/examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp),\n`attention.wv` refers to a tensor that holds the weights for the value\nvectors in the self-attention mechanism of the model. This tensor is crucial\nfor determining how much focus the model places on different parts of the\ninput when generating responses.</code></pre>\n<p>\n[^2]: <code class=\"inline\">attention.wo</code> refers to the weight matrix used in the output layer of the</p>\n<pre><code>attention mechanism within a transformer model. It plays a crucial role in\ntransforming the attention output into the final representation that is used\nfor generating predictions.</code></pre>\n<p>\n[^3]: <code class=\"inline\">feed_forward.w1</code> projects input to a higher-dimensional space, enabling</p>\n<pre><code>the capture of complex features. `feed_forward.w2` projects transformed\ninput back to the original dimension with a non-linear activation function,\nwhereas `feed_forward.w3` applies an additional transformation to enhance\nthe learning of complex patterns. These matrices collectively enable the\nfeed-forward network to transform and learn from the input effectively,\ncontributing to the overall performance of the transformer model.</code></pre>\n",
"tags": [
"ai",
"llm",
"energy-reduction",
"performance",
"quantisation"
]
},
{
"date": "2024-12-05",
"title": "💡 TIL: LLM Evaluation using Critique Shadowing",
"url": "/posts/TIL-llm-eval-critique-shadowing.html",
"content": "<p>\n<strong>TL;DR:</strong> Critique Shadowing offers an expert-centered approach to LLM evaluation by starting with binary pass/fail judgments and detailed critiques before building automated systems. This iterative methodology—reminiscent of 1970s knowledge engineering—prioritizes domain expertise over complex metrics, revealing valuable insights about products and users while developing reliable evaluation systems that capture nuanced quality standards.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nAs LLMs increasingly drive critical business decisions, ensuring their reliability becomes paramount. Many teams struggle with complex metrics and scoring systems that lead to confusion rather than clarity. <a href=\"https://hamel.dev/\">Hamel Husain</a>‘s Critique Shadowing methodology[^1] offers a systematic path from drowning in metrics to developing reliable evaluation systems.</p>\n<h2>\nThe Critique Shadowing Method</h2>\n<p>\nThe key insight behind Critique Shadowing is deceptively simple: start with binary (pass/fail) expert judgements and detailed critiques before building automated evaluation systems. This approach solves two critical challenges: capturing domain expertise and scaling evaluation processes.</p>\n<p>\nThis expert-centric approach echoes <a href=\"https://en.wikipedia.org/wiki/Knowledge_engineering\">knowledge engineering</a> practices from the 1970-80s, when AI researchers first recognised the necessity of systematically capturing domain expertise. Just as <a href=\"https://en.wikipedia.org/wiki/Mycin\">MYCIN</a>‘s creators worked closely with medical doctors to encode diagnostic knowledge, Critique Shadowing similarly structures the process of extracting expert judgement for LLM evaluation. While the technology has evolved from rule-based systems to large language models, the fundamental challenge of effectively capturing and operationalising expert knowledge remains central.</p>\n<h3>\nImplementation Process</h3>\n<p>\nThe methodology follows a structured, iterative process:</p>\n<center>\n <img src=\"https://raw.githubusercontent.com/ai-mindset/ai-mindset.github.io/refs/heads/master/images/Critique%20Framework%20Hamel%20Husain.png\" width=\"80%\" height=\"80%\"/></center>\n<ol>\n <li>\nIdentify a principal domain expert as the arbiter of quality </li>\n <li>\nCreate a diverse dataset covering different scenarios and user types </li>\n <li>\nExpert conducts binary pass/fail judgements with detailed critiques </li>\n <li>\nAddress discovered issues and verify fixes </li>\n <li>\nDevelop LLM-based judges using expert critiques as few-shot examples </li>\n <li>\nAnalyse error patterns and root causes </li>\n <li>\nCreate specialised judges for persistent issues </li>\n</ol>\n<p>\nThe process is continuous, repeating periodically or when material changes occur. For simpler applications or when manual review is feasible, teams can adapt or streamline these steps while maintaining the core principle of systematic data examination.</p>\n<h2>\nBeyond Automation</h2>\n<p>\nHusain’s most striking observation is that the process of developing evaluation systems often provides more value than the resulting automated judges. The systematic collection of expert feedback reveals product insights, user needs, and failure modes that might otherwise remain hidden. This understanding drives improvements in the core system, not just its evaluation.</p>\n<h2>\nConclusion</h2>\n<p>\nThe Critique Shadowing methodology succeeds by prioritising expert knowledge and systematic data collection over premature automation. For teams building LLM applications, this approach offers a clear path to reliable evaluation systems while simultaneously deepening their understanding of their product and users.\\ LLM evaluation is an active area of interest and research both in academia and industry. Here is a short list of resources to look into:</p>\n<ul>\n <li>\n<a href=\"https://www.ibm.com/think/topics/llm-evaluation\">IBM LLM Evaluation</a> </li>\n <li>\n<a href=\"https://docs.mistral.ai/guides/evaluation/\">Mistral AI- Evaluation</a> </li>\n <li>\n<a href=\"https://github.com/mistralai/mistral-evals\">Mistral Evals</a> </li>\n <li>\n<a href=\"https://docs.anthropic.com/en/docs/test-and-evaluate/eval-tool\">Anthropic- Using the Evaluation Tool</a> </li>\n <li>\n<a href=\"https://dev.to/guybuildingai/-top-5-open-source-llm-evaluation-frameworks-in-2024-98m\">Top 5 Open-Source LLM Evaluation Frameworks in 2024</a> </li>\n</ul>\n<hr class=\"thin\">\n<p>\n[^1]: Husain, H. (2024). “Creating a LLM-as-a-Judge That Drives Business</p>\n<pre><code>Results" https://hamel.dev/blog/posts/llm-judge/</code></pre>\n",
"tags": [
"til",
"llm",
"ai",
"machine-learning",
"mlops",
"best-practices",
"production",
"model-governance",
"evaluation",
"observability",
"monitoring",
"quality-assurance",
"iterative-refinement"
]
},
{
"date": "2024-12-03",
"title": "✍ A Path to Maintainable AI Systems using Norman's Design Principles",
"url": "/posts/design-principles-ds-ai.html",
"content": "<p>\n<strong>TL;DR:</strong> Don Norman’s timeless design principles- visibility, feedback, constraints, mappings, and error prevention- apply powerfully to AI systems, where abstract interfaces and complex workflows often become overwhelming. By implementing these principles with a carefully selected, minimal toolset, we can create maintainable, observable AI systems that reduce complexity while providing comprehensive functionality- just as Norman observed in physical objects, good design in AI leads to fewer errors and greater user satisfaction.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nDon Norman’s principles of good design, outlined in <a href=\"https://archive.org/details/thedesignofeverydaythingsbydonnorman\">The Design of Everyday Things</a>, are particularly relevant to Data Science and AI Engineering, where systems often suffer from unnecessary complexity. This article presents a minimalist approach to implementing these principles using a carefully selected set of tools that maximise impact while reducing operational overhead. Norman’s insights about visibility, feedback, constraints, and mappings translate powerfully to AI system design, where abstract interfaces and complex workflows can easily become overwhelming. Just as Norman observed that poorly designed physical objects lead to user frustration and errors, poorly architected AI systems can result in maintenance nightmares, hidden failure modes, and costly debugging cycles. By applying his principles- making system states visible, providing clear feedback, implementing appropriate constraints, and creating natural mappings between components, we can build AI systems that are not only more intuitive to use but also easier to maintain, debug, and evolve over time.</p>\n<h2>\nDesign Principles Implementation</h2>\n<h3>\n1. Visibility</h3>\n<p>\nImplement comprehensive system observability using <a href=\"https://mlflow.org/\">MLflow</a> as your central platform:</p>\n<ul>\n <li>\nTrack experiments, parameters, and metrics </li>\n <li>\nVersion models and artefacts </li>\n <li>\nLog production predictions and outcomes </li>\n <li>\nMonitor model performance metrics </li>\n</ul>\n<p>\nFor system-level metrics, use <a href=\"https://prometheus.io/docs/visualization/grafana/\">Prometheus/Grafana</a> to:</p>\n<ul>\n <li>\nTrack resource utilisation (CPU, memory, latency) </li>\n <li>\nMonitor prediction throughput </li>\n <li>\nCreate dashboards for key performance indicators </li>\n</ul>\n<p>\nImplement adaptive sampling for high-volume systems:</p>\n<pre><code class=\"python\">def should_log(request_id, sampling_rate=0.1):\n return hash(request_id) % 100 < (sampling_rate * 100)</code></pre>\n<h3>\n2. Feedback</h3>\n<p>\nUse <a href=\"https://prometheus.io/docs/visualization/grafana/\">Prometheus/Grafana</a> for real-time monitoring and alerting:</p>\n<ul>\n <li>\nSet up alerts for model performance degradation </li>\n <li>\nMonitor data distribution shifts </li>\n <li>\nTrack system health metrics </li>\n <li>\nConfigure tiered alerting based on severity </li>\n</ul>\n<p>\nExample metric collection:</p>\n<pre><code class=\"python\">from prometheus_client import Counter, Histogram\n\nPREDICTIONS = Counter('model_predictions_total', 'Total predictions made')\nLATENCY = Histogram('prediction_latency_seconds', 'Time spent processing prediction')\n\ndef predict(features):\n with LATENCY.time():\n prediction = model.predict(features)\n PREDICTIONS.inc()\n return prediction</code></pre>\n<h3>\n3. Constraints</h3>\n<p>\nImplement data and model guardrails using <a href=\"https://greatexpectations.io/\">Great Expectations</a>:</p>\n<ul>\n <li>\nDefine data quality expectations </li>\n <li>\nSet distribution bounds for features </li>\n <li>\nMonitor for data drift </li>\n <li>\nGenerate validation reports </li>\n</ul>\n<p>\nExample constraint implementation:</p>\n<pre><code class=\"python\">from great_expectations.dataset import Dataset\n\ndef validate_features(df):\n dataset = Dataset(df)\n dataset.expect_column_values_to_be_between("age", 0, 120)\n dataset.expect_column_values_to_not_be_null("critical_feature")\n validation_result = dataset.validate()\n return validation_result.success</code></pre>\n<h3>\n4. Mappings</h3>\n<p>\nUse <a href=\"https://mlflow.org/\">MLflow</a> to maintain clear relationships between:</p>\n<ul>\n <li>\nExperiments and business objectives </li>\n <li>\nModels and their training data </li>\n <li>\nPredictions and outcomes </li>\n <li>\nPerformance metrics and business KPIs </li>\n</ul>\n<p>\nExample mapping structure:</p>\n<pre><code class=\"python\">with mlflow.start_run(run_name="production_model_v1"):\n mlflow.log_param("business_objective", "customer_churn")\n mlflow.log_param("data_version", data_hash)\n mlflow.log_metric("business_impact", revenue_improvement)\n mlflow.log_artifact("feature_importance.json")</code></pre>\n<h3>\n5. Error Prevention and Recovery</h3>\n<p>\nIntegrate safeguards using your core toolset:</p>\n<p>\n<a href=\"https://mlflow.org/\">MLflow</a>:</p>\n<ul>\n <li>\nVersion control for models and artefacts </li>\n <li>\nRollback capabilities </li>\n <li>\nExperiment tracking for reproducibility </li>\n</ul>\n<p>\n<a href=\"https://prometheus.io/docs/visualization/grafana/\">Prometheus/Grafana</a>:</p>\n<ul>\n <li>\nEarly warning system for issues </li>\n <li>\nPerformance degradation detection </li>\n <li>\nResource exhaustion prevention </li>\n</ul>\n<p>\n<a href=\"https://greatexpectations.io/\">Great Expectations</a>:</p>\n<ul>\n <li>\nData quality validation </li>\n <li>\nSchema enforcement </li>\n <li>\nDistribution monitoring </li>\n</ul>\n<p>\nExample error prevention:</p>\n<pre><code class=\"python\">def safe_predict(features):\n if not validate_features(features):\n return fallback_prediction()\n\n try:\n with LATENCY.time():\n prediction = model.predict(features)\n PREDICTIONS.inc()\n return prediction\n except Exception as e:\n ERROR_COUNTER.inc()\n return fallback_prediction()</code></pre>\n<h2>\nImplementation Strategy</h2>\n<ol>\n <li>\n <p>\nStart with <a href=\"https://mlflow.org/\">MLflow</a> </p>\n <ul>\n <li>\nSet up experiment tracking - Implement model versioning - Configure basic logging </li>\n </ul>\n </li>\n <li>\n <p>\nAdd <a href=\"https://prometheus.io/docs/visualization/grafana/\">Prometheus/Grafana</a> </p>\n <ul>\n <li>\nDeploy basic monitoring - Set up key alerts - Create essential dashboards </li>\n </ul>\n </li>\n <li>\n <p>\nIntegrate <a href=\"https://greatexpectations.io/\">Great Expectations</a> </p>\n <ul>\n <li>\nDefine core data quality rules - Implement validation pipelines - Monitor data distributions </li>\n </ul>\n </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nBy focusing on a minimal set of powerful tools (<a href=\"https://mlflow.org/\">MLflow</a>, <a href=\"https://prometheus.io/docs/visualization/grafana/\">Prometheus/Grafana</a>, and <a href=\"https://greatexpectations.io/\">Great Expectations</a>), you can implement Norman’s design principles effectively while maintaining system simplicity. This approach provides:</p>\n<ul>\n <li>\nComprehensive visibility through unified logging and monitoring </li>\n <li>\nImmediate feedback via real-time alerts </li>\n <li>\nStrong constraints through data validation </li>\n <li>\nClear mappings between components </li>\n <li>\nRobust error prevention and recovery </li>\n</ul>\n<p>\nThe key is to fully utilise these core tools rather than adding complexity with additional solutions. This creates maintainable, observable, and reliable AI systems that can scale with your needs.</p>\n",
"tags": [
"ai",
"data-science",
"design-principles",
"code-quality",
"mlops",
"monitoring",
"observability",
"production",
"model-governance",
"minimal"
]
},
{
"date": "2024-12-02",
"title": "🐼 Pandas or 🐻❄️ Polars?",
"url": "/posts/pandas-polars.html",
"content": "<p>\n<strong>TL;DR:</strong> While Pandas excels at interactive exploration and smaller datasets with its Python-centric ecosystem, Polars leverages Rust’s performance for parallel processing and memory efficiency in large-scale data operations. Choose Pandas for rapid prototyping and datasets under 1GB, and Polars for production environments with demanding performance requirements or cross-language development needs in Python, Node.js, and Rust.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThe world of Python data processing has long revolved around the well established <a href=\"https://pandas.pydata.org/\">Pandas</a> library, but in recent years, a new contender has emerged in the form of <a href=\"https://pola.rs/\">Polars</a>. This post aims to provide a comparison of these two powerful data processing tools, that empowers the reader to make an informed choice on a case-by-case basis.</p>\n<h2>\nArchitecture and Design Comparison</h2>\n<p>\nAt the core, Pandas and Polars differ in their underlying implementation and design philosophies.</p>\n<h3>\nImplementation and Performance</h3>\n<p>\nThe Pandas library is written in Python/Cython, with a focus on single-threaded operations. In contrast, Polars is built upon the Rust programming language, leveraging its performance and concurrency capabilities to enable parallel processing by default.\\ This distinction in implementation has significant implications for memory management and query optimisation. Pandas typically works with multiple copies of data, while Polars utilises the Arrow data format, which allows for more efficient memory usage. Additionally, Polars offers automatic query optimisation, whereas Pandas users must rely on a more sequential, manual approach to optimising their data processing pipelines.</p>\n<p>\n| Feature | Pandas | Polars | | —————— | ————— | ——————- | | Implementation | Python/Cython | Rust | | Processing | Single-threaded | Parallel by default | | Memory Management | Multiple copies | Arrow format | | Query Optimisation | Sequential | Automatic |</p>\n<h3>\nAPI and Language Support</h3>\n<p>\nThe API and language support differences between Pandas and Polars are quite notable. Pandas -being a Python-only library- offers a mix of method chaining and attribute access approaches. In contrast, Polars takes a more expansive approach, providing implementations in Python, Node.js, and the Rust programming language itself.\\ This language versatility of Polars enables seamless JavaScript and TypeScript integration, allowing data scientists and developers to leverage the same performance benefits regardless of their preferred language. Additionally, Polars maintains a consistent method chaining syntax across these different language environments, simplifying the learning curve for users who may work with the library in multiple contexts.</p>\n<p>\n| Feature | Pandas | Polars | | ——————– | —————————————— | ————————– | | Language Support | Python-only | Python, Node.js, Rust | | API Style | Mixed method chaining and attribute access | Consistent method chaining | | Language Integration | N/A | JavaScript/TypeScript |</p>\n<h2>\nUse Cases and Trade-offs</h2>\n<p>\nWhile both Pandas and Polars excel in the realm of data processing, each library has distinct strengths and weaknesses that make them better suited for different use cases and scenarios.</p>\n<h3>\nWhen to Choose Pandas</h3>\n<p>\nPandas shines when it comes to interactive data exploration and working with smaller datasets, typically under 1GB in size. The library’s deep integration with the broader scientific computing ecosystem in Python, along with its intuitive syntax and extensive documentation, make it an excellent choice for rapid prototyping, educational contexts, and projects that require seamless compatibility with the Python-centric data science toolchain.</p>\n<h3>\nWhen to Choose Polars</h3>\n<p>\nOn the other hand, Polars emerges as the preferred choice for large-scale data processing, particularly for datasets exceeding 1GB. The library’s Rust-based implementation and parallel processing capabilities make it a more suitable option for production environments with demanding performance requirements. Polars also excels in memory-constrained systems, thanks to its efficient use of the Arrow data format, and it is an attractive choice for cross-language development teams due to its implementations in Python, Node.js, and Rust.\\ Furthermore, Polars demonstrates strengths in handling complex data transformations and time series processing at scale, areas where its optimised query engine and parallel processing features can truly shine.</p>\n<p>\nTo summarise the key differences:</p>\n<p>\n| Consideration | Pandas | Polars | | ——————— | ——————————————— | ———————————————– | | Dataset Size | Small to medium (<1GB) | Scales to larger datasets | | Performance | Suitable for interactive exploration | Excels at large-scale processing | | Memory Efficiency | Works with multiple data copies | Utilises Arrow format for efficiency | | Query Optimisation | Sequential, manual approach | Automatic optimisation | | Language Support | Python-only | Python, Node.js, Rust | | Ecosystem Integration | Strong in Python scientific computing | Limited cross-language integration | | Learning Resources | Extensive documentation and community support | Younger ecosystem, less comprehensive resources |</p>\n<p>\nUltimately, the choice between Pandas and Polars should be guided by the specific requirements of your project, such as data volume, performance needs, language preferences, and ecosystem integration requirements. Both libraries offer powerful data processing capabilities, and selecting the right one can significantly impact the success and efficiency of your data-driven initiatives.</p>\n<h2>\nConclusion</h2>\n<p>\nAfter carefully evaluating the key differences between Pandas and Polars, the choice between the two data processing libraries ultimately comes down to the specific requirements of your project and use case.\\ For projects focused on interactive data exploration and working with smaller datasets (under 1GB), Pandas remains the go-to choice. Its deep integration with the broader Python scientific computing ecosystem, extensive documentation, and large community make it a reliable and familiar option for many data scientists and developers.\\ However, for large-scale data processing, production environments, and cross-language teams, Polars presents a compelling alternative. Its performance advantages, memory efficiency, and multi-language support (Python, Node.js, Rust) make it an increasingly attractive choice for modern data-intensive applications.\\ When deciding between Pandas and Polars, consider factors such as dataset size, performance requirements, memory constraints, language preferences, and the level of ecosystem integration needed. Pandas may be the better fit for projects focused on rapid prototyping and educational use, while Polars can shine in mission-critical, large-scale data processing tasks.\\ Ultimately, both Pandas and Polars are powerful data processing tools, and the choice between them should be guided by the specific needs and constraints of your project. As the data processing landscape continues to evolve, it’s valuable to stay informed about the trade-offs and emerging alternatives to ensure you make the most informed decision for your team and organisation.</p>\n",
"tags": [
"python",
"pandas",
"polars",
"data-processing",
"code-quality",
"toolchain",
"data-science"
]
},
{
"date": "2024-11-27",
"title": "📊 Ten Ways to Model Data",
"url": "/posts/modelling-mindsets.html",
"content": "<p>\n<strong>TL;DR:</strong> This comprehensive guide explores ten distinct modelling approaches across statistics, machine learning, and causal inference-advocating for “T-shaped” expertise where practitioners develop deep knowledge in one or two mindsets aligned with their domain needs whilst maintaining sufficient breadth to recognise when different approaches are required, with specific recommendations for research, business, and product development contexts.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nAs a practitioner looking to work effectively with real-world data and generate meaningful insights, I face a crucial decision: which modelling approaches should I invest my time and energy in learning? After discovering Christoph Molnar’s <a href=\"https://christophmolnar.com/books/modeling-mindsets/\">Modeling Mindsets</a>, I realised this isn’t about picking the “best” approach. It’s about becoming what he calls a “T-shaped modeller”.\\ The concept is elegantly simple: rather than trying to master every possible approach (impossible) or limiting myself to just one (ineffective), I should aim to develop:</p>\n<ul>\n <li>\nDeep expertise in one or two mindsets that align with my goals and problems </li>\n <li>\nWorking knowledge of other approaches to recognise when my primary tools <br>\naren’t optimal </li>\n</ul>\n<p>\nThis systematic exploration serves two purposes:</p>\n<ol>\n <li>\nTo understand the landscape: What are the main modelling mindsets available </li>\n <li>\nTo make an informed choice: Which mindset(s) should I focus on mastering, <br>\ntoday? What are their core premises, strengths, and limitations? given my goals and constraints? </li>\n</ol>\n<p>\nEach mindset represents a different way of approaching problems through data. From the probability-focused world of statistical modelling to the interactive realm of reinforcement learning, from the causality-oriented approach to the pattern-finding nature of unsupervised learning, each offers unique tools and perspectives.\\ By examining these mindsets systematically, I aim to make an informed decision about where to focus my learning efforts while maintaining enough breadth to recognise when I should switch approaches. This isn’t just about theoretical understanding, it’s about practical effectiveness in solving real-world problems.</p>\n<p>\nLet’s explore each mindset in turn, focusing on their fundamental premises, key strengths, and limitations to guide this decision.</p>\n<h2>\nStatistical Modelling: The Foundation of Data-Driven Inference</h2>\n<p>\n<em>This mindset sees the world through probability distributions. At its core, it’s about modelling how data is generated and making inferences under uncertainty.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nEverything has a distribution, from dice rolls to customer behaviours </li>\n <li>\nModels encode assumptions about how data is generated </li>\n <li>\nModels are evaluated by both checking if their assumptions make sense and </li>\n <li>\nUses same data for fitting and evaluation, unlike machine learning approaches <br>\nmeasuring how well they match the data </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nProvides rigorous mathematical framework for handling uncertainty </li>\n <li>\nStrong theoretical foundation spanning decades of research </li>\n <li>\nForces explicit consideration of data-generating processes </li>\n <li>\nVersatile for decisions, predictions, and understanding </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nManual and often tedious modelling process </li>\n <li>\nStruggles with complex data types like images and text </li>\n <li>\nGood model fit doesn’t guarantee good predictions </li>\n <li>\nLess automatable than modern machine learning approaches </li>\n</ol>\n<p>\nThis mindset serves as the foundation for three important sub-approaches: Frequentism, Bayesianism, and Likelihoodism, each with its own interpretation of probability and evidence. For someone starting in data science, understanding statistical modelling provides crucial groundwork for understanding both traditional statistics and modern machine learning approaches.</p>\n<h2>\nFrequentism: Making Decisions Through Repeated Experiments</h2>\n<p>\n<em>Frequentism views probability as long-run frequency and assumes that parameters in the world are fixed but unknown. It’s the dominant approach in many scientific fields, particularly in medicine and psychology.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nInterprets probability as frequency in infinite repetitions </li>\n <li>\nMakes decisions through hypothesis tests and confidence intervals </li>\n <li>\nRelies on “imagined experiments” to draw conclusions </li>\n <li>\nFocuses on estimating fixed, true parameters </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nEnables clear, binary decisions </li>\n <li>\nComputationally fast compared to other approaches </li>\n <li>\nNo need for prior information </li>\n <li>\nWidely accepted in scientific research </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nOften oversimplifies complex questions into yes/no decisions </li>\n <li>\nVulnerable to p-hacking (searching for significant results) </li>\n <li>\nInterpretation can be counterintuitive, especially for confidence intervals </li>\n <li>\nResults depend on the experimental design, not just the data </li>\n</ol>\n<p>\nFor practitioners, Frequentism offers a well-established framework with clear decision rules and strong scientific acceptance. However, its limitations in handling uncertainty and tendency toward oversimplification have led to growing interest in alternative approaches like Bayesian inference.</p>\n<h2>\nBayesianism: Learning Through Updated Beliefs</h2>\n<p>\n<em>Bayesianism stands out by treating parameters themselves as random variables with distributions, fundamentally different from Frequentism’s fixed-parameter view. It focuses on updating beliefs about parameters as new data arrives.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nRequires prior distributions before seeing data </li>\n <li>\nUpdates beliefs through Bayes’ theorem </li>\n <li>\nProduces complete posterior distributions, not just point estimates </li>\n <li>\nNaturally propagates uncertainty through all calculations[^1] </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nCan incorporate prior knowledge and expert opinions </li>\n <li>\nProvides complete probability distributions for parameters </li>\n <li>\nMore intuitive interpretation of uncertainty </li>\n <li>\nCleanly separates inference (getting posteriors) from decisions (using them) </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nChoosing priors can be difficult and controversial </li>\n <li>\nComputationally intensive, especially for complex models </li>\n <li>\nMathematically more demanding than frequentist approaches </li>\n <li>\nCan seem like overkill for simple decisions </li>\n</ol>\n<p>\nBayesianism offers a more complete and intuitive framework for handling uncertainty, but requires more computational resources and mathematical sophistication. It’s particularly valuable when prior knowledge is important or when understanding full uncertainty is crucial.</p>\n<h2>\nLikelihoodism: Pure Evidence Through Likelihood</h2>\n<p>\n<em>Likelihoodism attempts to reform statistical inference by focusing solely on likelihood as evidence, avoiding both Frequentism’s imagined experiments and Bayesianism’s subjective priors.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nUses likelihood ratios to compare hypotheses </li>\n <li>\nAdheres strictly to the likelihood principle </li>\n <li>\nRejects both prior probabilities and sampling distributions </li>\n <li>\nCompares models based on their relative evidence </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nMore coherent than Frequentism’s mixed toolkit </li>\n <li>\nAvoids subjective elements of Bayesianism </li>\n <li>\nIdeas work well within other statistical mindsets </li>\n <li>\nAdheres to likelihood principle (evidence depends only on observed data) </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nCannot make absolute statements, only relative comparisons </li>\n <li>\nNo clear mechanism for making final decisions </li>\n <li>\nLacks tools for expressing beliefs or uncertainty </li>\n <li>\nLess practical than other statistical approaches </li>\n</ol>\n<p>\nLikelihoodism offers interesting theoretical insights but may be less immediately useful than Frequentist or Bayesian approaches. It’s more valuable for understanding the foundations of statistical inference than for day-to-day data analysis.</p>\n<h2>\nCausal Inference: From Association to Causation</h2>\n<p>\n<em>Causal inference moves beyond correlation to understand what actually causes observed effects, providing a framework for analysing interventions and their impacts.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nUses Directed Acyclic Graphs (DAGs) to visualise relationships </li>\n <li>\nDistinguishes between association and causation </li>\n <li>\nRequires explicit encoding of causal assumptions </li>\n <li>\nCan work with both statistical models and machine learning </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nAddresses fundamental questions about cause and effect </li>\n <li>\nMakes assumptions explicit through DAGs </li>\n <li>\nModels tend to be more robust than pure association-based approaches </li>\n <li>\nProvides clear framework for analysing interventions </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nRequires identifying all relevant confounders </li>\n <li>\nCannot verify all causal assumptions from data alone </li>\n <li>\nMultiple competing frameworks can confuse newcomers </li>\n <li>\nMay sacrifice predictive performance for causal understanding </li>\n</ol>\n<p>\nFor practitioners, causal inference is essential when decisions about interventions are needed, though it requires careful consideration of assumptions and domain knowledge. It’s particularly valuable in fields like medicine, policy-making, and business strategy where understanding cause-effect relationships is crucial.</p>\n<h2>\nMachine Learning: Algorithms Learning from Data</h2>\n<p>\n<em>Machine learning approaches problems by having computers learn algorithms from data, focusing on task performance rather than theoretical underpinning.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nComputer-first approach to learning from data </li>\n <li>\nExternal evaluation based on task performance </li>\n <li>\nLess constrained by statistical assumptions </li>\n <li>\nIncludes supervised, unsupervised, reinforcement, and deep learning </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nTask-oriented and pragmatic approach </li>\n <li>\nHighly automatable </li>\n <li>\nWell-suited for building digital products </li>\n <li>\nStrong industry adoption and career opportunities </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nLess principled than statistical approaches </li>\n <li>\nMany competing approaches can be overwhelming </li>\n <li>\nModels often prioritise performance over interpretability </li>\n <li>\nUsually requires substantial data and computation </li>\n</ol>\n<p>\nFor practitioners, machine learning offers powerful tools for automation and prediction, particularly valuable in industry settings. It’s especially useful when theoretical understanding is less important than practical performance.</p>\n<h3>\nSupervised Learning: The Art of Prediction</h3>\n<p>\n<em>Supervised learning frames everything as a prediction problem, using labelled data to learn mappings from inputs to outputs.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nLearning is optimisation and search in hypothesis space </li>\n <li>\nModels evaluated on unseen data, not training data </li>\n <li>\nFocuses on generalising to new cases </li>\n <li>\nHighly automatable and competition-friendly </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nClear evaluation metrics </li>\n <li>\nHighly automatable </li>\n <li>\nStrong performance on prediction tasks </li>\n <li>\nWell-defined optimisation objectives </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nRequires labelled data </li>\n <li>\nModels often black-box (uninterpretable) </li>\n <li>\nNot hypothesis-driven </li>\n <li>\nMay miss causal relationships </li>\n <li>\nCan fail in unexpected ways when patterns change </li>\n</ol>\n<p>\nFor practitioners, supervised learning excels in prediction tasks where good labelled data exists and interpretability isn’t crucial. It’s particularly valuable in industry settings for automation and decision support.</p>\n<h3>\nUnsupervised Learning: Discovering Hidden Patterns</h3>\n<p>\n<em>This mindset focuses on finding inherent structures in data without labelled outcomes, making it ideal for exploratory analysis and pattern discovery.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nDiscovers patterns in data distributions </li>\n <li>\nIncludes clustering, dimensionality reduction, anomaly detection </li>\n <li>\nNo ground truth for validation </li>\n <li>\nMore open-ended than supervised learning </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nFinds patterns other approaches might miss </li>\n <li>\nExcellent for initial data exploration </li>\n <li>\nFlexible for undefined problems </li>\n <li>\nCan reveal natural groupings in data </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nHard to validate results objectively </li>\n <li>\nFeature weighting is often arbitrary </li>\n <li>\nSuffers from curse of dimensionality[^2] </li>\n <li>\nNo guarantee of finding meaningful patterns </li>\n</ol>\n<p>\nFor practitioners, unsupervised learning is valuable for initial data exploration and when labelled data isn’t available. It’s particularly useful in customer segmentation, anomaly detection, and dimension reduction.</p>\n<h3>\nReinforcement Learning: Learning Through Interaction</h3>\n<p>\n<em>This mindset models an agent interacting with an environment, making decisions and learning from rewards.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nAgent learns by taking actions and receiving rewards </li>\n <li>\nHandles delayed and sparse rewards </li>\n <li>\nBalances exploration and exploitation </li>\n <li>\nCreates its own training data through interaction </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nModels dynamic real-world interactions </li>\n <li>\nExcellent for sequential decision-making </li>\n <li>\nCan discover novel strategies </li>\n <li>\nLearns through direct experience </li>\n <li>\nCombines well with deep learning </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nNot all problems fit agent-environment framework </li>\n <li>\nOften unstable or difficult to train </li>\n <li>\nMay perform poorly in real-world conditions </li>\n <li>\nRequires careful reward design </li>\n <li>\nComplex implementation choices </li>\n</ol>\n<p>\nFor practitioners, reinforcement learning is valuable for problems involving sequential decisions or control, particularly in robotics, game playing, and resource management.</p>\n<h3>\nDeep Learning: End-to-End Neural Networks</h3>\n<p>\n<em>This mindset approaches problems through deep neural networks, letting the model learn both features and relationships.</em></p>\n<p>\nKey Aspects:</p>\n<ul>\n <li>\nModels tasks end-to-end through neural networks </li>\n <li>\nLearns hierarchical representations automatically </li>\n <li>\nHighly modular architecture design </li>\n <li>\nBenefits from transfer learning and pre-trained models </li>\n</ul>\n<p>\nPrimary Strengths:</p>\n<ol>\n <li>\nExcels at complex data (images, text, speech) </li>\n <li>\nLearns useful feature representations </li>\n <li>\nHighly modular and customisable </li>\n <li>\nStrong tooling and community support </li>\n <li>\nCan handle multiple inputs/outputs seamlessly </li>\n</ol>\n<p>\nNotable Limitations:</p>\n<ol>\n <li>\nUnderperforms on tabular data versus tree methods </li>\n <li>\nRequires large amounts of data </li>\n <li>\nComputationally intensive </li>\n <li>\nHard to train and tune effectively </li>\n <li>\nResults can be difficult to interpret </li>\n</ol>\n<p>\nFor practitioners, deep learning is essential for complex data types but may be overkill for simpler problems. Most valuable in computer vision, natural language processing, and other complex pattern recognition tasks.</p>\n<h2>\nConclusion</h2>\n<p>\n<em><strong>aka Choosing Your Modelling Path</strong></em></p>\n<p>\nFor developing T-shaped expertise in modelling, the practitioner’s choice should align with their primary domain while maintaining broader awareness. Here’s how to approach this decision:</p>\n<ul>\n <li>\n <p>\n<em>Scientific Research</em> demands Statistical Modelling for its rigorous <br>\nuncertainty quantification and established peer review frameworks. </p>\n </li>\n <li>\n <p>\n<em>Business Predictions</em> benefit most from Supervised Learning, optimising <br>\nprediction accuracy while enabling automation and scalability. </p>\n </li>\n <li>\n <p>\n<em>Complex Data</em> (images/text) requires Deep Learning to handle unstructured <br>\ndata and learn hierarchical features effectively. </p>\n </li>\n <li>\n <p>\n<em>Interventions/Policies</em> need Causal Inference to distinguish correlation from <br>\ncausation and understand intervention effects. </p>\n </li>\n <li>\n <p>\n<em>Control Systems</em> thrive with Reinforcement Learning for sequential decisions <br>\nand environment interaction. </p>\n </li>\n</ul>\n<p>\nFor practical applications, certain combinations prove particularly effective:</p>\n<ul>\n <li>\n <p>\n<em>Industry/Business</em> combines Supervised Learning with Unsupervised Learning, <br>\nenabling accurate predictions while discovering valuable patterns in customer data. </p>\n </li>\n <li>\n <p>\n<em>Research</em> pairs Statistical Modelling with Machine Learning, balancing <br>\nacademic rigour with modern capabilities. </p>\n </li>\n <li>\n <p>\n<em>Product Development</em> merges Deep Learning with Supervised Learning for <br>\nend-to-end features with clear metrics. </p>\n </li>\n <li>\n <p>\n<em>Medical Diagnostics</em> unites Supervised Learning with Statistical Modelling, <br>\ncrucial for evidence-based decisions with proper uncertainty quantification. </p>\n </li>\n</ul>\n<p>\nThe choice should be based on the practitioner’s domain requirements, computational resources, interpretability needs, and available time for mastery. <em>Remember:</em> Mastery of one mindset with broad awareness surpasses superficial knowledge of many.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Because Bayesian models treat everything as probability distributions</p>\n<pre><code>(rather than fixed values), any predictions or conclusions automatically\ninclude their associated uncertainty. For example, if you predict someone's\nfuture income using multiple uncertain factors, the final prediction comes\nas a range of possibilities with their probabilities, rather than just a\nsingle number.</code></pre>\n<p>\n[^2]: Here is a</p>\n<pre><code>[nice digital flashcard](https://bsky.app/profile/chrisalbon.com/post/3lbendflq2w2n)\nby Chris Albon, on the concept of _curse of dimensionality_</code></pre>\n",
"tags": [
"modelling-mindsets",
"data-science",
"ai",
"data-modeling",
"neural-network",
"best-practices",
"statistics",
"machine-learning",
"decision-making"
]
},
{
"date": "2024-11-26",
"title": "💡 TIL: Pydantic, Python's Data Validation Guard",
"url": "/posts/TIL-pydantic.html",
"content": "<p>\n<strong>TL;DR:</strong> Pydantic transforms Python’s type hints from passive documentation into active runtime validators, automatically converting and validating data types while providing intelligent error handling-significantly reducing boilerplate code and catching potential errors at system boundaries for more reliable API development, configuration management, and data processing pipelines.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nToday I started using <a href=\"https://docs.pydantic.dev/latest/\">Pydantic</a>, a Python library that handles data validation through Python type annotations. Pydantic brings runtime type checking and data validation that catches errors before they cause mysterious bugs in an application. It uses type hints to validate data at runtime, automatically converting and validating data types, preventing bugs, and reducing boilerplate code. It’s essential for robust API development, configuration management, and data processing pipelines.</p>\n<h2>\nUnderstanding Pydantic and Its Value</h2>\n<p>\nPydantic leverages Python’s type hints to validate data structures. It converts your type hints from mere documentation into active runtime checks, ensuring data consistency throughout your application. Here are Pydantic’s key features:</p>\n<h3>\nType Enforcement</h3>\n<pre><code class=\"python\">from pydantic import BaseModel\n\nclass User(BaseModel):\n name: str\n age: int\n email: str\n\n# This raises a ValidationError\nuser = User(name="John", age="not_a_number", email="john@example.com")</code></pre>\n<h3>\nAutomatic Type Coercion</h3>\n<pre><code class=\"python\">class Order(BaseModel):\n quantity: int\n price: float\n\n# Pydantic automatically converts valid strings to numbers\norder = Order(quantity="3", price="9.99")\nprint(order.quantity) # 3 (int)\nprint(order.price) # 9.99 (float)</code></pre>\n<h3>\nReal-World Benefits</h3>\n<ul>\n <li>\n<strong>API Development</strong>: Validates incoming JSON data automatically </li>\n <li>\n<strong>Configuration Management</strong>: Ensures config files meet your specifications </li>\n <li>\n<strong>Database Operations</strong>: Validates data before insertion </li>\n <li>\n<strong>Data Parsing</strong>: Converts between JSON, dictionaries, and model instances <br>\nseamlessly </li>\n</ul>\n<h3>\nWhy It Matters</h3>\n<ol>\n <li>\n<strong>Error Prevention</strong>: Catches data issues at system boundaries </li>\n <li>\n<strong>Clean Code</strong>: Reduces validation boilerplate </li>\n <li>\n<strong>Self-Documenting</strong>: Type hints serve as both validation rules and </li>\n <li>\n<strong>Performance</strong>: Compiled validation code runs efficiently <br>\ndocumentation </li>\n</ol>\n<h2>\nConclusion</h2>\n<p>\nPydantic transforms Python’s type hints from passive documentation into active data validators, significantly reducing runtime errors and improving code reliability.</p>\n",
"tags": [
"til",
"data-validation",
"python",
"type-checking",
"data-modeling",
"code-quality",
"pydantic",
"error-handling"
]
},
{
"date": "2024-11-23",
"title": "🏺 Historical Evolution of AI",
"url": "/posts/ai-evolution.html",
"content": "<p>\n<strong>TL;DR:</strong> This historical overview traces AI’s evolution through four major paradigms: from symbolic reasoning and expert systems (1950s-1970s), through neural networks and Bayesian approaches (1980s-1990s), to the Big Data revolution (2000s-2010s), culminating in today’s integrated systems that combine multiple philosophical approaches-suggesting future progress requires unifying these diverse methodologies rather than choosing between them.</p>\n<!--more-->\n<h1>\nAI’s Historical Evolution</h1>\n<h2>\nIntroduction</h2>\n<p>\nThe field of Artificial Intelligence has undergone several paradigm shifts since its inception, each representing a distinct approach to creating intelligent systems. Drawing from Pedro Domingos’ framework in <a href=\"https://en.wikipedia.org/wiki/The_Master_Algorithm\">The Master Algorithm</a> we can trace how different schools of thought have shaped our understanding and implementation of AI technologies.</p>\n<h2>\nHistorical Evolution</h2>\n<h3>\nEarly Foundations: Symbolic AI and Expert Systems (1950s-1970s)</h3>\n<p>\nThe pioneers of AI began with symbolic reasoning, believing intelligence could be reduced to symbol manipulation. This <em>symbolist</em> approach offered explicit reasoning chains and interpretability but struggled with real-world complexity. Expert Systems followed, successfully applying rule-based reasoning to narrow domains while revealing the challenges of scaling knowledge-based systems.</p>\n<h3>\nThe Rise of Neural Approaches (1980s-1990s)</h3>\n<p>\nThe <em>connectionist</em> movement emerged with neural networks, drawing inspiration from biological systems. This era introduced pattern recognition capabilities and learning from examples. Simultaneously, the <em>Bayesian</em> school brought statistical methods to the forefront, offering principled approaches to handling uncertainty but requiring significant data and computational resources.</p>\n<h3>\nThe Data Revolution (2000s-2010s)</h3>\n<p>\nBig Data and Deep Learning foundations emerged as the <em>analogiser</em> school gained prominence. This period saw the convergence of massive datasets, computational power, and sophisticated architectures. Deep Learning breakthrough demonstrated the power of automatic feature learning, though at the cost of increased computational demands and reduced interpretability.</p>\n<h3>\nContemporary AI: The Era of Integration (2020s)</h3>\n<p>\nCurrent AI systems, particularly large language models, represent a synthesis of multiple schools. They combine symbolic reasoning[^1], neural architectures[^2], and statistical learning[^3], achieving impressive generative capabilities and few-shot learning. However, they face challenges in resource requirements, reliability, and alignment with human values. A nicely distilled overview of what is today’s AI, comes from an <a href=\"https://xcancel.com/karpathy/status/1864033537479135369\">Andrej Karpathy Tweet</a>.</p>\n<h2>\nConclusion</h2>\n<p>\nThe evolution of AI reveals a field shaped by competing philosophies, each contributing essential insights. As Domingos argues, the future likely lies not in the dominance of any single approach but in their unification. While recent advances demonstrate the potential of synthesising different methods, significant challenges remain in creating truly intelligent systems that are both powerful and reliable.</p>\n<p>\nThe path forward requires building on these foundations while addressing core challenges in efficiency, interpretability, and alignment. Rather than choosing between different schools of thought, the field must continue to integrate their strengths while mitigating their individual weaknesses.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Symbolic reasoning in modern AI manifests through attention mechanisms and</p>\n<pre><code>transformers' ability to process structured input like code or mathematical\nexpressions. While not explicitly rule-based like early AI, these systems\ncan learn and apply symbolic patterns.</code></pre>\n<p>\n[^2]: Neural architectures in contemporary AI primarily use the transformer</p>\n<pre><code>architecture, where self-attention layers process information in parallel,\nallowing the model to weigh the importance of different inputs contextually.</code></pre>\n<p>\n[^3]: Statistical learning appears in the form of probabilistic token prediction</p>\n<pre><code>and the use of large-scale statistical patterns learned during training.\nModels learn probability distributions over sequences, enabling them to\ngenerate coherent outputs.</code></pre>\n",
"tags": [
"ai",
"evolution",
"llm",
"symbolic",
"neural-network",
"data-science"
]
},
{
"date": "2024-11-22",
"title": "🔄 Considering Iterative Refinement Over Unit Testing",
"url": "/posts/iterative-refinement.html",
"content": "<p>\n<strong>TL;DR:</strong> Drawing inspiration from Norvig, Howard, and Sanderson, this article advocates for iterative refinement over traditional unit testing, emphasising techniques like doctests that keep verification close to code-reducing maintenance burden whilst improving reliability by focusing on actual usage patterns rather than rigid test-driven development that can lead to outdated tests and ossified code structures.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nIn the realm of software development and related fields, three influential figures -Peter Norvig (former Director of Research at Google), Jeremy Howard (founder of fast.ai), and Grant Sanderson (creator of 3Blue1Brown)- demonstrate the power of iterative refinement over rigid test-driven development. Their approaches, while applied in different domains, share common principles that challenge traditional development practices.</p>\n<h2>\nIterative Refinement</h2>\n<h3>\nPeter Norvig’s Software Development</h3>\n<p>\nNorvig’s approach, demonstrated in both his <a href=\"https://norvig.com/docex.html\">original <code class=\"inline\">docex</code> module</a> and his <a href=\"https://norvig.com/spell-correct.html\">spell corrector implementation</a>, emphasises tests that are tightly coupled with the code they verify. Before Python’s doctests[^1] were officially supported, he created the <code class=\"inline\">docex</code> module specifically to write tests in docstrings using a concise syntax like</p>\n<pre><code class=\"python\">def factorial(n):\n """Return the factorial of n, an exact integer >= 0.\n >>> [factorial(n) for n in range(6)]\n [1, 1, 2, 6, 24, 120]\n It must also not be ridiculously large:\n >>> factorial(1e100)\n Traceback (most recent call last):\n ...\n OverflowError: n too large\n """\n ...\n\nif __name__ == "__main__":\n import doctest\n doctest.testmod()</code></pre>\n<pre><code class=\"console\">$ python fact.py -v\nTrying:\n [factorial(n) for n in range(6)]\nExpecting:\n [1, 1, 2, 6, 24, 120]\nok\nTrying:\n factorial(1e100)\nExpecting:\n Traceback (most recent call last):\n ...\n OverflowError: n too large\nok\n2 items passed all tests:\n 1 test in __main__\n 6 tests in __main__.factorial\n7 tests in 2 items.\n7 passed.\nTest passed.\n$</code></pre>\n<p>\nEven in his spell corrector, Norvig uses simple functions with in-line test cases rather than separate test files. This approach keeps tests close to the code they verify, making them part of the living documentation rather than separate artefacts that can drift out of sync.\\ <em>Update: While randomly skimming through PyTorch code, it was good to stumble across examples of <a href=\"https://github.com/pytorch/pytorch/blob/main/torch/autograd/grad_mode.py\">code containing doctests</a>.</em></p>\n<h3>\nJeremy Howard’s Machine Learning Development</h3>\n<p>\nHoward’s methodology, evidenced in fast.ai’s development and his book “Deep Learning for Coders” advocates for rapid prototyping in notebooks. His emphasis lies in getting end-to-end solutions working quickly, then iteratively improving them based on actual usage patterns. In his latest course <a href=\"https://solveit.fast.ai/\">SolveIt</a>, Howard extends this iterative philosophy to <a href=\"{{ site.baseurl }}{% link _posts/2024-11-15-dialogue-engineering.md %}\">Dialogue Engineering</a>, i.e. using Large Language Models in an iterative conversation to develop solutions, demonstrating how modern AI can be integrated into the development workflow while maintaining the principles of continuous refinement.</p>\n<h3>\nGrant Sanderson’s Visual Mathematics</h3>\n<p>\nThis iterative philosophy extends to mathematical animations. In Grant Sanderson’s <a href=\"https://www.youtube.com/watch?v=rbu7Zu5X1zI\">How I animate</a> video, he demonstrates how he builds visualisations incrementally, starting with basic shapes and gradually refining them while continuously previewing the results. This approach allows for creative exploration while maintaining momentum.</p>\n<h3>\nThe Problem with Traditional Testing</h3>\n<p>\nTraditional unit testing often fragments development workflow by requiring separate test maintenance and can lead to ossified code structures. When tests aren’t exercised regularly, they become outdated, creating false confidence. This is particularly problematic in rapidly evolving domains like AI, where interfaces and requirements frequently change.</p>\n<h2>\nConclusion</h2>\n<p>\nInstead of extensive unit test suites, it’s worth considering:</p>\n<ol>\n <li>\nWriting working code first </li>\n <li>\nUsing doctests for critical functions </li>\n <li>\nRelying on end-to-end validation </li>\n <li>\nRefactoring based on actual usage patterns </li>\n <li>\nKeeping tests focused on stable interfaces </li>\n</ol>\n<p>\nThis approach reduces maintenance burden while ensuring code remains reliable where it matters most, that is in production.</p>\n<p>\n“<em>Programs must be written for people to read, and only incidentally for machines to execute</em>“- Abelson & Sussman. The same applies to tests.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Python has supported</p>\n<pre><code>[doctests](https://docs.python.org/3/library/doctest.html) natively since\nv2.6.9</code></pre>\n",
"tags": [
"fast-ai",
"answer-ai",
"iterative-refinement",
"doctests",
"best-practices",
"llm",
"dialogue-engineering",
"code-quality"
]
},
{
"date": "2024-11-21",
"title": "シ Back to Basics: A Modern, Minimal Python Toolchain",
"url": "/posts/bring-it-back-to-basics.html",
"content": "<p>\n<strong>TL;DR:</strong> This article presents a streamlined Python toolchain that reduces cognitive load while maintaining the language’s data science capabilities, featuring Rust-based tools like uv (package manager) and Ruff (linter/formatter), along with pyright for type checking-all configured through a single pyproject.toml file and complemented by essential libraries for data processing, visualisation, and AI development.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nPython’s ecosystem for Data Science and AI is unmatched in its depth and maturity. Yet, its fragmented tooling landscape often leads to decision paralysis and opinions galore: virtualenv or venv? pip or conda? black or flake8? These choices, while providing flexibility, can create unnecessary cognitive load and often foster dogmatic opinions about “the right way” to do things. After exploring alternative stacks, I’m returning to Python. Not least because it’s perfect, but because it’s productive. The challenge isn’t Python’s capabilities; it’s the abundance and complexity of its tooling. This article presents a carefully curated, minimal toolkit that leverages Python’s ecosystem while avoiding its common setup pitfalls.</p>\n<h2>\nMotivation</h2>\n<p>\nThe appeal of integrated toolchains like Deno 2.0 is undeniable. Zero setup, immediate productivity, and a cohesive development experience. My recent exploration of alternative stacks revealed the value of unified tools that just work. While JavaScript’s ecosystem for Data Science and AI is growing rapidly, it still lacks the depth and maturity of Python’s scientific computing stack.\\ This exploration led to an important realisation: aside from an expansive Data and AI ecosystem, Python development can be achieved with a streamlined workflow that increases productivity and decreases complexity. Rather than accepting the cognitive overhead of multiple competing tools, I decided to create my own compact toolchain that meets most Data Science and AI requirements with minimalism, simplicity, and clarity in mind. The goal isn’t to prescribe another “right way” of doing things, but rather to demonstrate how a carefully chosen set of modern tools can create a development experience that rivals the integrated approaches of newer platforms while leveraging Python’s mature ecosystem.</p>\n<h2>\nMy Approach</h2>\n<h3>\nLocal Development</h3>\n<p>\nMy toolchain starts with the following foundational choices that eliminate common Python setup headaches:</p>\n<ol start=\"0\">\n <li>\n <p>\n<a href=\"https://peps.python.org/pep-0008/\">PEP8</a>: Let’s start with a style guide, so </p>\n </li>\n <li>\n <p>\n<a href=\"https://docs.astral.sh/uv/\">uv</a>: A blazing-fast Python package and project <br>\nthat the team is on the same page manager, written in Rust. It replaces pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more, providing: - Consistent dependency resolution - Lightning-fast package installations - Built-in virtual environment management - Direct integration with <code class=\"inline\">pyproject.toml</code> </p>\n </li>\n <li>\n <p>\n<a href=\"https://packaging.python.org/en/latest/guides/writing-pyproject-toml/\"><code class=\"inline\">pyproject.toml</code></a>: <br>\nThe single source of truth for project configuration. For example: </p>\n </li>\n</ol>\n<pre><code class=\"toml\"> [project]\n name = "my-ds-project"\n version = "0.1.0"\n dependencies = [\n "polars",\n "tensorflow",\n "plotly"\n ]\n\n [tool.ruff]\n line-length = 90\n select = ["E", "F", "I"]\n\n # Required only if you use pytest for unit testing\n [tool.pytest.ini_options]\n testpaths = ["tests"]</code></pre>\n<ol start=\"3\">\n <li>\n <p>\n<a href=\"https://docs.astral.sh/ruff/\">Ruff</a>: A Rust-based tool that combines <br>\nformatting and linting, replacing the need for black, flake8, isort etc.: - Single-tool code quality enforcement - Configurable through <code class=\"inline\">pyproject.toml</code> - Significantly faster than Python-based alternatives </p>\n </li>\n <li>\n <p>\n<a href=\"https://github.com/microsoft/pyright\">pyright</a>: Static Type Checker for <br>\nPython - Static type checker - <a href=\"https://htmlpreview.github.io/?https://github.com/python/typing/blob/main/conformance/results/results.html\">Standards</a> compliant - <a href=\"https://microsoft.github.io/pyright/#/configuration?id=sample-pyprojecttoml-file\">Configurable</a> within <code class=\"inline\">pyproject.toml</code> </p>\n </li>\n <li>\n <p>\n<a href=\"{{ site.baseurl }}{% link\n_posts/2024-11-22-iterative-refinement.md %}\">iterative refinement</a>: An approach that tightly couples (doc)tests with code, ensuring <a href=\"https://www.merriam-webster.com/thesaurus/up-to-dateness\">up-to-dateness</a>\\ <del><a href=\"https://docs.pytest.org/en/stable/\">pytest</a>: Handles testing with minimal boilerplate and rich assertions</del> </p>\n </li>\n</ol>\n<h3>\nCross-Platform Distribution</h3>\n<ol>\n <li>\nPyInstaller for creating stand-alone executables </li>\n <li>\nGitHub Actions workflow for automated builds: </li>\n</ol>\n<pre><code class=\"yaml\">- name: Build executables\n run: |\n pyinstaller --onefile src/main.py</code></pre>\n<ol start=\"3\">\n <li>\nLocal cross-compilation using <a href=\"https://podman.io/\">Podman</a>: </li>\n</ol>\n<pre><code class=\"Dockerfile\">FROM python:3.13-slim\nCOPY . /app\nWORKDIR /app\nRUN pip install pyinstaller\nCMD pyinstaller --onefile src/main.py</code></pre>\n<h3>\nData Science</h3>\n<p>\nA carefully selected set of powerful libraries that minimize overlap:</p>\n<ul>\n <li>\n<a href=\"https://pola.rs/\">Polars</a>: Fast DataFrame operations with a cohesive API. <br>\n<a href=\"https://xcancel.com/charliermarsh/status/1860388882015223835\">Why?</a> </li>\n</ul>\n<pre><code class=\"python\"> import polars as pl\n\n def analyse_customer_behavior(path: str):\n return (\n pl.scan_parquet(path)\n .with_columns([\n pl.col("purchase_date").str.to_datetime(),\n (pl.col("amount") * pl.col("quantity")).alias("total_spend")\n ])\n .group_by([\n pl.col("customer_id"),\n pl.col("purchase_date").dt.month().alias("month")\n ])\n .agg([\n pl.col("total_spend").sum().alias("monthly_spend"),\n pl.col("product_id").n_unique().alias("unique_products"),\n pl.col("purchase_date").count().alias("purchase_frequency")\n ])\n .sort(["customer_id", "month"])\n .collect()\n )</code></pre>\n<ul>\n <li>\n<a href=\"https://www.tensorflow.org/\">TensorFlow 2</a>: Deep learning when needed </li>\n</ul>\n<pre><code class=\"python\"> import tensorflow as tf\n mnist = tf.keras.datasets.mnist\n\n (x_train, y_train),(x_test, y_test) = mnist.load_data()\n x_train, x_test = x_train / 255.0, x_test / 255.0\n\n model = tf.keras.models.Sequential([\n tf.keras.layers.Flatten(input_shape=(28, 28)),\n tf.keras.layers.Dense(128, activation='relu'),\n tf.keras.layers.Dropout(0.2),\n tf.keras.layers.Dense(10, activation='softmax')\n ])\n\n model.compile(optimiser='adam',\n loss='sparse_categorical_crossentropy',\n metrics=['accuracy'])\n\n model.fit(x_train, y_train, epochs=5)\n model.evaluate(x_test, y_test)</code></pre>\n<ul>\n <li>\n<a href=\"https://xgboost.ai/\">XGBoost</a>: Gradient boosting for structured data </li>\n</ul>\n<pre><code class=\"python\">from xgboost import XGBClassifier\n# read data\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\ndata = load_iris()\nX_train, X_test, y_train, y_test = train_test_split(data['data'], data['target'], test_size=.2)\n# create model instance\nbst = XGBClassifier(n_estimators=2, max_depth=2, learning_rate=1, objective='binary:logistic')\n# fit model\nbst.fit(X_train, y_train)\n# make predictions\npreds = bst.predict(X_test)</code></pre>\n<ul>\n <li>\n<a href=\"https://plotly.com/python/\">Plotly</a>: Interactive visualizations </li>\n</ul>\n<pre><code class=\"python\">import plotly.express as px\ndf = px.data.iris()\nfig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", symbol="species")\nfig.show()</code></pre>\n<ul>\n <li>\n<a href=\"https://mlflow.org\">MLFlow</a>: Managing the Machine Learning Lifecycle </li>\n</ul>\n<center>\n <img src=\"https://raw.githubusercontent.com/ai-mindset/ai-mindset.github.io/refs/heads/main/images/40_MLFlow.png\"/></center>\n<br /><h3>\nAI Engineering</h3>\n<p>\nWith hybrid solutions becoming more prevalent nowadays, we can use a combination of tools.</p>\n<ul>\n <li>\n<a href=\"https://ollama.com/\">Ollama</a>: Local model deployment and inference </li>\n</ul>\n<pre><code class=\"python\"> import ollama\n\n def technical_advisor():\n messages = [\n {\n "role": "system",\n "content": "You are a technical advisor specializing in Python architecture."\n },\n {\n "role": "user",\n "content": "What's the best way to handle database migrations?"\n }\n ]\n\n response = ollama.chat(model='llama2', messages=messages)\n messages.append(response['message'])\n\n # Follow-up question with context\n messages.append({\n "role": "user",\n "content": "How would that work with SQLAlchemy specifically?"\n })\n\n return ollama.chat(model='llama2', messages=messages)</code></pre>\n<ul>\n <li>\n<a href=\"https://docs.llamaindex.ai/\">LlamaIndex</a>: RAG pipeline construction using <br>\nlocal LLMs or external APIs </li>\n</ul>\n<pre><code class=\"python\"> from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n from llama_index.core.node_parser import SentenceSplitter\n from llama_index.core.retrievers import VectorIndexRetriever\n from llama_index.core.query_engine import RetrieverQueryEngine\n\n def create_custom_rag():\n # Load and parse documents\n documents = SimpleDirectoryReader("technical_docs").load_data()\n parser = SentenceSplitter(chunk_size=1024, chunk_overlap=20)\n nodes = parser.get_nodes_from_documents(documents)\n\n # Create index with custom settings\n index = VectorStoreIndex(nodes)\n\n # Custom retriever with similarity threshold\n retriever = VectorIndexRetriever(\n index=index,\n similarity_top_k=3,\n filters=lambda x: float(x.get_score()) > 0.7\n )\n\n # Create query engine with custom retriever\n query_engine = RetrieverQueryEngine(retriever=retriever)\n return query_engine</code></pre>\n<ul>\n <li>\n<a href=\"https://www.mongodb.com/\">MongoDB</a>: A distributed document DB that supports <br>\nvector storage and graph operations </li>\n</ul>\n<pre><code class=\"python\"> from pymongo import MongoClient\n import numpy as np\n\n def vector_search(text_embedding: np.ndarray, threshold: float = 0.8):\n client = MongoClient("mongodb://localhost:27017/")\n db = client.vector_db\n\n pipeline = [\n {\n "$search": {\n "index": "vector_index",\n "knnBeta": {\n "vector": text_embedding.tolist(),\n "path": "embedding",\n "k": 5\n }\n }\n },\n {\n "$match": {\n "score": {"$gt": threshold}\n }\n },\n {\n "$project": {\n "_id": 0,\n "text": 1,\n "score": {"$meta": "searchScore"}\n }\n }\n ]\n\n return list(db.documents.aggregate(pipeline))</code></pre>\n<p>\n<em>Update: Looking into <a href=\"https://weaviate.io/\">Weaviate</a> as an all-in-one DB solution.</em></p>\n<p>\nThis stack provides everything needed for modern Data Science and AI work while maintaining clarity and minimising tool overlap.</p>\n<h2>\nConclusion</h2>\n<p>\nReturning to Python with this minimal, modern toolchain has proven to be a pragmatic choice. The combination of uv, Ruff, and Pytest creates a more unified development workflow, while retaining access to Python’s mature scientific computing ecosystem.</p>\n<p>\nKey benefits of this approach:</p>\n<ol>\n <li>\n<strong>Reduced Cognitive Load</strong>: One tool per task eliminates decision fatigue </li>\n <li>\n<strong>Modern Performance</strong>: Rust-based tools (uv, Ruff) provide near-instant </li>\n <li>\n<strong>Simplified Configuration</strong>: Single <code class=\"inline\">pyproject.toml</code> as source of truth </li>\n <li>\n<strong>Production Ready</strong>: Direct path from development to cross-platform </li>\n <li>\n<strong>Full Feature Set</strong>: Complete Data Science and AI capabilities without bloat </li>\n <li>\n<strong>Flexible AI Stack</strong>: Seamless integration between local models (Ollama), </li>\n <li>\n<strong>Production AI</strong>: Easy transition from experimentation to production AI <br>\nfeedback deployment RAG pipelines (LlamaIndex), and vector storage (MongoDB) systems with consistent tooling </li>\n</ol>\n<p>\nWhile Python’s ecosystem will likely remain fragmented, we don’t have to accept the complexity. By carefully choosing modern tools that prioritise speed, simplicity, and clarity, we can create a development environment that’s both powerful and pleasant to use.</p>\n<p>\nThe beauty of this approach lies not in its prescriptiveness, but in its principles: <em>minimize tooling</em>, <em>maximise capability</em>, and <em>maintain clarity</em>. Whether you adopt this exact stack or use it as inspiration for your own, the goal remains the same: bring the focus back to solving problems rather than managing tools.</p>\n",
"tags": [
"python",
"type-checking",
"code-quality",
"github-actions",
"ci-cd",
"cross-platform",
"minimal",
"toolchain"
]
},
{
"date": "2024-11-20",
"title": "💡 TIL: TF-IDF vs BM25",
"url": "/posts/TIL-BM25-TFIDF.html",
"content": "<p>\n<strong>TL;DR:</strong> While TF-IDF ranks documents based on term frequency weighted by rarity across a corpus, BM25 improves upon this foundation by adding term frequency saturation and document length normalisation. Choose TF-IDF for simpler tasks with uniformly-sized documents, but prefer BM25 for search engines handling varied document lengths where its sophisticated algorithm delivers superior retrieval performance despite requiring more complex implementation and parameter tuning.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nWhen building search engines or document retrieval systems, two algorithms often come up: <a href=\"https://en.wikipedia.org/wiki/Tf%E2%80%93idf\">TF-IDF</a> and <a href=\"https://en.wikipedia.org/wiki/Okapi_BM25\">Okapi BM25</a>. While both aim to rank documents by relevance, they differ significantly in their approach and effectiveness. Today, I learned the key differences between these techniques and when to use each one.</p>\n<h2>\nTF-IDF: The Classic Approach</h2>\n<p>\nTF-IDF (Term Frequency-Inverse Document Frequency) ranks documents based on how frequently terms appear in a document, weighted by how rare those terms are across all documents. It’s straightforward: if a word appears often in a document but is rare across the corpus, it’s probably important[^1]. $idf$ is calculated as follows:</p>\n<p>\n$$idf(t) = \\log\\frac{N}{n_t}$$</p>\n<p>\nwhere:\\ $N$ : Total number of documents in corpus\\ $n_t$ : Number of documents containing term $t$</p>\n<p>\nTF-IDF is derived by the following calculation:</p>\n<p>\n$$TF\\text{-}IDF(t,d) = tf(t,d) \\cdot idf(t)$$</p>\n<p>\nwhere:\\ $tf(t,d)$ : Frequency of term $t$ in document $d$</p>\n<h3>\nAdvantages</h3>\n<ul>\n <li>\nSimple to understand and implement </li>\n <li>\nComputationally efficient </li>\n <li>\nWorks well for documents of similar length </li>\n <li>\nGreat for basic document classification </li>\n</ul>\n<h3>\nDisadvantages</h3>\n<ul>\n <li>\nNo term frequency saturation (more occurrences always mean higher scores) </li>\n <li>\nDoesn’t handle varying document lengths well </li>\n <li>\nCan overemphasise common terms in long documents </li>\n</ul>\n<h2>\nBM25: The Modern Evolution</h2>\n<p>\nBM25 (Best Match 25) builds upon TF-IDF’s foundation but adds two crucial improvements: term frequency saturation and document length normalisation. Note how the $idf_{BM25}$ component differs from TF-IDF’s:</p>\n<p>\n$$idf_{BM25}(t) = \\log\\frac{N- n_t + 0.5}{n_t + 0.5}$$</p>\n<p>\nThis modification provides smoother IDF weights and better handles edge cases.</p>\n<p>\n$$BM25(t,d) = \\frac{tf(t,d) \\cdot (k<em>1 + 1)}{tf(t,d) + k_1 \\cdot (1- b + b \\cdot \\frac{|d|}{avgdl})} \\cdot idf</em>{BM25}$$</p>\n<p>\nwhere:\\ $tf(t,d)$ : Frequency of term $t$ in document $d$\\ $\\|d\\|$ : Length of document $d$ (in words)\\ $avgdl$ : Average document length in corpus\\ $k_1$ : Term frequency saturation parameter (typically 1.2-2.0)\\ $b$ : Length normalisation parameter (typically 0.75)\\ $N$ : Total number of documents in corpus\\ $n_t$ : Number of documents containing term $t$</p>\n<h3>\nAdvantages</h3>\n<ul>\n <li>\nBetter handles varying document lengths </li>\n <li>\nPrevents term frequency from dominating scores </li>\n <li>\nMore nuanced relevance rankings </li>\n <li>\nIndustry standard for search engines </li>\n</ul>\n<h3>\nDisadvantages</h3>\n<ul>\n <li>\nMore complex implementation </li>\n <li>\nRequires parameter tuning </li>\n <li>\nSlightly higher computational cost </li>\n <li>\nLess interpretable than TF-IDF </li>\n</ul>\n<h2>\nWhich to Choose?</h2>\n<h3>\nChoose TF-IDF when:</h3>\n<ul>\n <li>\nBuilding basic document classification systems </li>\n <li>\nWorking with uniformly-sized documents </li>\n <li>\nNeeding interpretable results </li>\n <li>\nPrioritising implementation simplicity </li>\n</ul>\n<h3>\nChoose BM25 when:</h3>\n<ul>\n <li>\nBuilding a search engine </li>\n <li>\nHandling documents of varying lengths </li>\n <li>\nRequiring state-of-the-art retrieval performance </li>\n <li>\nWorking with user queries </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nWhile TF-IDF remains valuable for simpler tasks and educational purposes, BM25 is generally superior for serious search applications. The choice between them often comes down to the trade-off between simplicity and sophistication. For modern search engines, BM25 is the clear winner, but TF-IDF’s simplicity makes it perfect for learning and basic applications.</p>\n<p>\nRemember: the best algorithm is the one that meets your specific needs. Don’t automatically reach for BM25 just because it’s more advanced – sometimes, simpler is better.</p>\n<p>\n[^1]: This is why TF-IDF is effective at identifying characteristic terms in documents. It automatically downweights common words like “the”, “and”, “is” while highlighting distinctive terms that appear frequently in specific documents.</p>\n",
"tags": [
"til",
"tf-idf",
"bm25",
"text-ranking",
"nlp"
]
},
{
"date": "2024-11-15",
"title": "🆙 Level Up With Dialogue Engineering",
"url": "/posts/dialogue-engineering.html",
"content": "<p>\n<strong>TL;DR:</strong> Dialogue Engineering transforms AI interactions by replacing one-shot prompts with structured, multi-turn conversations that break complex tasks into manageable steps: setting scenarios, gathering information, creating structured outlines, generating content iteratively, and refining conclusions. This systematic approach dramatically improves productivity across research, business, and content creation while maintaining human oversight to address AI limitations like accuracy and contextual understanding.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nDialogue Engineering is transforming how we interact with AI[^1]. Rather than relying on one-shot prompts, it’s an iterative approach where we engage in structured, multi-turn conversations with LLMs (Large Language Models) to achieve complex goals. While I first encountered the term through Jeremy Howard[^2][^3], the concept has deeper roots in human-AI interaction research. Though Howard popularised it recently through fast.ai and answer.ai, the concept has been discussed since 1986[^4]. Dialogue engineering dramatically improves productivity by breaking down complex tasks, maintaining context across interactions, and guiding AI through iterative refinement. This systematic approach helps produce better results while reducing the cognitive load of prompt crafting. A nice overview of Dialogue Engineering comes from the Medium article <a href=\"https://medium.com/@fabioc/dialog-engineering-ai-as-your-research-assistant-616a625e9853\">Dialog Engineering: AI as Your Research Assistant</a>.\\ Below, I’ll summarise what I inferred from that article.</p>\n<h2>\nHow Dialogue Engineering Works</h2>\n<ol>\n <li>\n<strong>Setting the Scenario</strong>\\ </li>\n <li>\n<strong>Gathering Information</strong>\\ </li>\n <li>\n<strong>Structuring the Outline</strong>\\ </li>\n <li>\n<strong>Generating Content Iteratively</strong>\\ </li>\n <li>\n<strong>Conclusion and Introduction Refinement</strong>\\ <br>\nThis first step involves defining clear objectives and research questions before engaging with AI. Rather than diving into broad topics, we frame specific goals and provide relevant context. For example, when starting a research project, we outline exactly what we need to investigate and any important background information the AI should consider.\\ <em>Best Practice:</em> Be clear and specific about goals, provide relevant background information to help AI understand context. Once the scenario is set, we guide the AI in collecting and organising relevant data. This could involve creating annotated bibliographies, summarising key sources, or compiling research findings. The AI helps structure this information in a way that’s useful for the next steps.\\ <em>Best Practice:</em> Request structured formats like annotated bibliographies, ask for citations and evidence to ensure accuracy. Before diving into content creation, we work with the AI to develop a clear roadmap. This outline breaks down the task into logical sections, ensuring a coherent flow and manageable chunks of work.\\ <em>Best Practice:</em> Break the task into clear sections, ensure logical connections between parts that reflect overall goals. With the outline in place, we tackle each section individually through iterative refinement. Rather than expecting perfect content immediately, we provide feedback and guide the AI to improve its outputs progressively.\\ <em>Best Practice:</em> Work on sections individually to maintain focus, use feedback loops to guide AI toward more specific, accurate outputs. The final step involves revisiting the opening and closing sections once the main content is complete. This ensures these crucial parts accurately reflect and synthesise the entire piece.\\ <em>Best Practice:</em> Write introduction last to accurately reflect content, craft conclusion by synthesising main takeaways from each section. </li>\n</ol>\n<p>\nThroughout all steps, I maintain active oversight, checking for accuracy and providing clear feedback. This systematic approach has dramatically improved my productivity while ensuring high-quality outputs.</p>\n<h2>\nPractical Applications</h2>\n<p>\nHere are the key areas where dialogue engineering proves particularly valuable:</p>\n<ul>\n <li>\n<strong>Academic Research</strong>\\ </li>\n <li>\n<strong>Business Strategy and Reporting</strong>\\ </li>\n <li>\n<strong>Report Automation</strong>\\ </li>\n <li>\n<strong>Content Creation and Media</strong>\\ </li>\n <li>\n<strong>Technical Writing and Documentation</strong>\\ <br>\nResearchers can leverage dialogue engineering to synthesise vast amounts of information, structure complex arguments, and ensure accurate citations. The iterative approach is particularly useful for literature reviews and thesis development.\\ <em>Example:</em> A researcher prompts AI to generate an annotated bibliography on AI-driven diagnostics, focusing on recent studies, then iteratively refines the summaries and findings. For corporate applications, dialogue engineering helps generate market reports, analyse trends, and produce comprehensive strategy documents. This systematic approach ensures consistency while maintaining analytical depth.\\ <em>Example:</em> Business analysts use iterative prompts to draft sections of market reports, starting with “<em>Generate a section on e-commerce trends focusing on AI-driven personalisation</em>“ then refining based on specific data points. Dialogue Engineering excels at automating recurring business reports, such as quarterly financial reviews or performance summaries. The structured approach ensures consistency while allowing for customisation.\\ <em>Example:</em> Teams automate quarterly reports by structuring templates with AI, feeding relevant data, and using iterative refinement to maintain accuracy and freshness. Content creators can streamline the production of articles, blogs, and multimedia scripts through structured dialogue with AI. This approach particularly shines in drafting and revising content iteratively.\\ <em>Example:</em> Writers use dialogue engineering to draft introductory paragraphs, then iterate with prompts for more engaging language or additional examples. In fields requiring precise technical documentation, dialogue engineering helps ensure clarity, accuracy, and consistency across complex documents.\\ <em>Example:</em> Software engineers use dialogue engineering to draft technical documentation for new features, prompting “<em>Draft a technical overview of the new user authentication feature</em>“ then refining for clarity and technical accuracy. </li>\n</ul>\n<p>\nEach of these applications benefits from dialogue engineering’s structured, iterative approach, leading to more efficient workflows and higher-quality outputs.</p>\n<h2>\nBest Practices</h2>\n<p>\nKey best practices include:</p>\n<ul>\n <li>\n<strong>Precision in Prompts</strong>\\ </li>\n <li>\n<strong>Iterative Refinement</strong>\\ </li>\n <li>\n<strong>Leverage Feedback Loops</strong>\\ </li>\n <li>\n<strong>Source and Citation Checking</strong>\\ </li>\n <li>\n<strong>Structure Before Diving In</strong>\\ </li>\n <li>\n<strong>Mind Token Limits</strong>\\ <br>\nCraft prompts that are neither too vague nor overly specific. Focus on clear, well-structured queries that guide AI towards relevant outputs. <em>Example:</em> Instead of “<em>Tell me about AI in healthcare</em>“ use “<em>What are the latest advancements in AI-driven diagnostics in healthcare, particularly in image recognition?</em>“ Build on each interaction, using feedback to improve outputs gradually rather than expecting perfection immediately. <em>Example:</em> Start with a draft section, then refine with follow-up prompts like “<em>Expand on the use of dialogue engineering in business reporting, specifically market trend analysis.</em>“ Maintain continuous cycles of prompting, feedback, and refinement to improve output quality over time. <em>Example:</em> When creating an outline, start broad, then use feedback to add specific sections on practical examples in different domains. Verify AI-generated sources and citations manually, as AI models lack real-time access to databases. <em>Example:</em> Cross-reference any cited statistics or research papers with trusted external sources before including them in final outputs. Create clear outlines or plans before generating detailed content to ensure logical flow and completeness. <em>Example:</em> Start with a structured outline for a Medium article, then develop each section iteratively. Break down long content into manageable chunks to work within AI model token limits.\\ <em>Example:</em> Generate long-form content section by section, refining each piece before moving to the next. </li>\n</ul>\n<p>\nHowever, we should be aware of the limitations (and challenges) of Dialogue Engineering too.</p>\n<h2>\nUnderstanding the Limitations</h2>\n<p>\nWhile these best practices enhance the use of dialogue engineering, it’s essential to acknowledge its constraints and challenges. Like any powerful tool, dialogue engineering comes with limitations that require careful consideration and management. Here’s what we need to keep in mind:</p>\n<h3>\nKey Limitations and Challenges</h3>\n<p>\nThe foremost concern when using generative AI is <em>accuracy</em> and <em>hallucinations</em>. LLMs can sometimes generate plausible-sounding but false information, necessitating rigorous fact-checking processes. This is particularly critical in professional contexts where accuracy is paramount. <em>Ethical implications</em> also demand attention. While AI can streamline work processes, maintaining authenticity and proper attribution is crucial. This connects directly to the need for <em>consistent human oversight</em>, that is users must <em>actively review outputs</em>, <em>ensure quality control</em>, and <em>make ethical judgements</em> about the content’s appropriateness and accuracy. AI’s current <em>limitations in understanding context and nuance</em> present another challenge. Models may struggle with subtle distinctions or produce oversimplified explanations, especially in specialised fields. Technical constraints, particularly token limits and handling complex, multi-layered reasoning tasks, further necessitate careful planning and task breakdown. These limitations underscore a crucial point: dialogue engineering works best as a <em>collaborative tool</em> that <em>enhances</em>, rather than replaces, human expertise and judgement.</p>\n<h2>\nConclusion</h2>\n<p>\nDialogue Engineering represents a significant evolution in human-AI interaction, moving beyond simple prompt engineering to create a dynamic, iterative approach. Through structured conversations and systematic refinement, it enables us to tackle complex tasks more efficiently across academic, business, and creative domains. While the technique requires careful attention to limitations like AI hallucinations and demands consistent human oversight, its power lies in treating AI as a collaborative partner rather than a one-shot tool. By following best practices and understanding its constraints, dialogue engineering becomes a force multiplier for productivity, helping us create better outputs while maintaining human expertise at the core of the process. This balance of systematic interaction and human judgement makes dialogue engineering a valuable framework for anyone looking to maximise the potential of AI tools in their workflow.</p>\n<hr class=\"thin\">\n<p>\n[^1]: AI is an umbrella term that has meant different things over the years.</p>\n<pre><code>Since 2022, it has become a synonym of Generative AI. Here's a short AI\ntimeline:\n[Symbolic AI](https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence)\n(1950-60s), [Expert Systems](https://en.wikipedia.org/wiki/Expert_system)\n(1970s), [Neural Networks](https://en.wikipedia.org/wiki/Neural_network) and\n[Knowledge Representation](https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning)\n(1980s), [Machine Learning](https://en.wikipedia.org/wiki/Machine_learning)\nand [Statistical Methods](https://en.wikipedia.org/wiki/Statistics) (1990s),\n[Big Data](https://en.wikipedia.org/wiki/Big_data) and Deep Learning\nfoundations (2000s),\n[Deep Learning](https://en.wikipedia.org/wiki/Deep_learning) (2010s),\n[Generative AI](https://en.wikipedia.org/wiki/Generative_artificial_intelligence)\nand\n[Large Language Models](https://en.wikipedia.org/wiki/Large_language_model)\n(2020s)</code></pre>\n<p>\n[^2]: <a href=\"https://www.youtube.com/watch?v=qO-YqJm0Q1U&t=16\">Answer.ai & AI Magic with Jeremy Howard</a></p>\n<p>\n[^3]: <a href=\"https://www.answer.ai/posts/2024-11-07-solveit.html\">How To Solve It With Code</a></p>\n<p>\n[^4]: _Foundations of dialog engineering: the development of human-computer</p>\n<pre><code>interaction. Part II_\n[(Gaines et al., 1986)](https://www.sciencedirect.com/science/article/pii/S0020737386800438)</code></pre>\n",
"tags": [
"ai",
"llm",
"dialogue-engineering",
"prompt",
"iterative-refinement",
"rag"
]
},
{
"date": "2024-11-14",
"title": "✅ On-boarding that works",
"url": "/posts/good-onboarding.html",
"content": "<p>\n<strong>TL;DR:</strong> Effective onboarding programmes dramatically reduce the productivity gap for new hires by applying evidence-based learning principles: breaking skills into manageable components, providing structured learning paths, targeting 80% proficiency before moving on, defining minimum productive competency, using checklists with frequent feedback, and focusing on mechanical competency- all based on how human memory and learning actually work, transforming the costly standard approach where developers take months to become fully productive.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nI recently watched a lively presentation titled <a href=\"https://www.youtube.com/watch?v=Og7NzaVpceE\">opinionated onboarding</a>, that discussed the dramatic impact of onboarding practices on new staff and business success. The speaker, drawing from his experience, articulated that poor onboarding is actively harming companies while good onboarding can transform team productivity and retention. Having experienced both ends of the spectrum myself, I strongly agree with the presenter’s central thesis. Companies can’t afford to waste months getting new staff up to speed, yet that’s what’s happening very frequently. The costs are staggering, both in lost productivity and squandered talent.</p>\n<h2>\nThe Problem</h2>\n<p>\nNew employees face an overwhelming cognitive burden as they simultaneously navigate multiple learning curves: mastering the tech stack, deciphering an unfamiliar codebase, adapting to team workflows, understanding the business domain, and learning organisational structures. Many companies worsen this situation through ineffective approaches, either expecting staff to self-direct their learning with vague instructions like “go learn X and tell us when you’re done”, or by immediately assigning them tickets without proper context or support. This inefficiency comes at a significant cost though. According to the presenter, the average software developer staying at a company for only 20 months, taking 6 months to become productive means losing nearly a third of their effective tenure[^1]. Rather than fixing their onboarding processes, many companies respond by exclusively hiring senior professionals who can “hit the ground running”- an approach that not only limits their talent pool but proves unrealistic even for experienced hires.</p>\n<h2>\nEffective On-boarding Strategies</h2>\n<p>\nAccording to the presenter, research and experience show that effective onboarding follows clear cognitive science principles. Rather than overwhelming new hires with broad, unfocused (occasionally learning) objectives, successful onboarding programs recognise how human learning actually works and structure their approach accordingly. Two key principles emerge as foundational to any effective onboarding strategy:</p>\n<ol>\n <li>\nFocus on building specific capabilities rather than general “understanding” </li>\n <li>\nAccount for cognitive limitations: <ul>\n <li>\nPeople can only hold ~4 concepts in working memory - New concepts take more mental space than familiar ones - Skills must be practiced close to when they’re learned </li>\n </ul>\n </li>\n</ol>\n<h3>\nBest Practices</h3>\n<p>\nThese principles translate into concrete best practices that any organisation can implement. While the specific skills and technologies may vary between teams, successful onboarding programs share common structural elements that maximize learning efficiency while minimizing cognitive overload:</p>\n<ul>\n <li>\n<strong>Break down complex skills into smaller, manageable components</strong>. For </li>\n <li>\n<strong>Provide structured learning paths rather than self-directed exploration</strong>. </li>\n <li>\n<strong>Aim for 80% proficiency before moving to next skill</strong>. This threshold </li>\n <li>\n<strong>Define minimum productive competency for the role</strong>. Not every skill needs </li>\n <li>\n<strong>Use checklists and frequent practice with feedback</strong>. Clear checkpoints </li>\n <li>\n<strong>Focus on mechanical competency to reduce cognitive load</strong>. When basic <br>\nexample, rather than asking someone to “learn LiveView”[^2] break it down into specific tasks like creating forms, handling events, or managing state When new hires must decide what to learn next, they waste valuable mental capacity on planning rather than learning. A clear, predefined path eliminates this overhead ensures sufficient mastery while avoiding diminishing returns from pursuing perfection to be mastered immediately- identify what’s truly needed for day-one productivity and focus there first provide motivation and progress tracking, while regular feedback prevents learners from developing incorrect habits operations become automatic, developers can focus their mental energy on solving more complex problems </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nEffective onboarding is not just a nice-to-have, it’s a competitive necessity in today’s software industry. While poor onboarding practices continue to cost companies valuable time and talent, the path to improvement is clear. By <em>breaking down</em> complex skills, providing <em>structured learning</em> paths, and <em>respecting</em> cognitive limitations, organisations can dramatically reduce the time it takes for new hires to become productive team members. This investment in structured onboarding not only accelerates developer productivity but also expands hiring possibilities, allowing companies to tap into a broader talent pool. The choice is simple: continue losing months of productivity to ineffective onboarding, or implement these evidence-based practices to build stronger, more capable engineering teams.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Since the presentation focused on software developers, I use it here as a</p>\n<pre><code>proxy for various technical positions including AI Engineering, Data Science\nand others. Also, the tenure statistic may be skewed towards the U.S.\nmarket, however it's true that many employees job hop in pursuit of a higher\nsalary or a better job altogether</code></pre>\n<p>\n[^2]: <a href=\"https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html\">Phoenix LiveView</a></p>\n<pre><code>"_is a process that received events, updates its state and renders updates\nto a page as diffs_"</code></pre>\n",
"tags": [
"onboarding",
"best-practices",
"productivity",
"company-culture",
"decision-making",
"talent",
"learning",
"efficiency"
]
},
{
"date": "2024-11-14",
"title": "💪 The Advantage",
"url": "/posts/the-advantage.html",
"content": "<p>\n<strong>TL;DR:</strong> Patrick Lencioni’s “The Advantage” argues that organisational health-built through leadership teams founded on trust, clear communication, and aligned values-is the single greatest competitive advantage companies can achieve, outweighing traditional “smart” business strategies and metrics for sustainable long-term success.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nRecently I listened to a <a href=\"https://saasscalingsecrets.buzzsprout.com/2172375/episodes/15926541-why-slower-growth-could-be-your-fast-track-to-success-with-roan-lavery-ceo-of-freeagent\">successful company’s CEO interview</a>, where he explained how growing sustainably contributed to the company’s success. The CEO said that their growth strategy was inspired by <a href=\"https://www.tablegroup.com/pat/\">Patrick Lencioni</a>‘s book <a href=\"https://www.tablegroup.com/product/the-advantage/\">The Advantage</a>. I found the book’s premise very interesting, hence I’ll attempt to summarise important points as a note to self.</p>\n<h2>\nCentral Thesis</h2>\n<p>\nOrganisational health is the <em>single greatest competitive advantage</em> a company can achieve, yet it’s often overlooked in favour of “smart” business decisions.</p>\n<h2>\nKey components of organisational health</h2>\n<h3>\nLeadership team structure</h3>\n<ul>\n <li>\nOptimal size: 3-10 people </li>\n <li>\nBuilt on trust and vulnerability </li>\n <li>\nValues collective success over individual achievement </li>\n</ul>\n<h3>\nCore principles</h3>\n<ul>\n <li>\nTrust and vulnerability as foundations </li>\n <li>\nAccountability at all levels </li>\n <li>\nCommitment to collective goals </li>\n <li>\nClear communication and expectations </li>\n</ul>\n<h3>\nOperational excellence</h3>\n<ul>\n <li>\nRegular, focused meetings with specific purposes </li>\n <li>\nClear distinction between strategic and tactical discussions </li>\n <li>\nEmphasis on debate and healthy conflict resolution </li>\n <li>\nContinuous progress monitoring </li>\n</ul>\n<h3>\nPeople and culture</h3>\n<ul>\n <li>\nHire for cultural fit rather than training after hiring </li>\n <li>\nFocus on values alignment in recruitment </li>\n <li>\nReward behaviour that aligns with organisational values </li>\n <li>\nFoster an environment of open communication </li>\n</ul>\n<h3>\nCritical success factors</h3>\n<ul>\n <li>\nMinimal internal politics </li>\n <li>\nHigh clarity in communication </li>\n <li>\nClear decision-making processes </li>\n <li>\nLow employee turnover </li>\n <li>\nHigh morale and productivity </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nThe book’s fundamental message is that creating a healthy organisation through strong leadership, clear communication, and aligned values is more important than traditional business metrics for long-term success. It’s not about being the “smartest” in the market, but about creating the healthiest internal environment.</p>\n",
"tags": [
"slow-down",
"advantage"
]
},
{
"date": "2024-11-14",
"title": "🖥 The On-Prem Comeback (aka Cloud Repatriation)",
"url": "/posts/cloud-repatriation.html",
"content": "<p>\n<strong>TL;DR:</strong> Cloud repatriation-the strategic migration of applications and data from public clouds back to on-premises infrastructure-is gaining traction as organisations seek better cost control, performance, data privacy, and reduced vendor lock-in, with companies like 37signals projecting significant savings whilst maintaining a balanced hybrid approach rather than abandoning cloud entirely.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nMore recently, a notable shift is emerging in how organisations approach their cloud infrastructure. Some companies are beginning to move their applications and data away from public cloud providers like AWS, GCP and Azure. This marks a shift in the computing landscape.</p>\n<h2>\nWhat Does Cloud Repatriation Mean?</h2>\n<p>\nCloud repatriation refers to the process of moving applications, services and data from public cloud environments back to on-premises data centres, private clouds or hybrid set-ups. This reverse migration represents a pivot from the “cloud-first” mindset that has dominated in the last few years.</p>\n<h2>\nWhy It’s Happening</h2>\n<p>\nSeveral key factors are driving this trend. For larger companies, cost is a real consideration, with scale-ups such as 37signals projecting savings of £2 million p.a. by leaving AWS. Performance issues and rising cloud costs have led major organisations like GEICO to reconsider their cloud strategy after experiencing <br>\n2.5x increases in their bills alongside reliability challenges. <em>Data privacy</em>, <br>\n<em>compliance</em> requirements and the desire to avoid <em>vendor lock-in</em> are also significant motivators aside from growing costs. Many organisations are finding that running certain workloads on-premises or in hybrid environments offers better control over their infrastructure and data. See <a href=\"https://www.youtube.com/watch?v=kyJJeik9loU\">optimising infrastructure for AI</a> for a nice overview on cloud<->on-prem.</p>\n<h2>\nConclusion</h2>\n<p>\nCloud repatriation isn’t about completely abandoning public clouds but rather about finding the right balance. For organisations with predictable workloads and sufficient technical expertise, a strategic combination of on-premises, private cloud and public cloud infrastructure might prove more effective than a public-cloud-only approach.</p>\n",
"tags": [
"cloud",
"on-prem"
]
},
{
"date": "2024-11-14",
"title": "🐢 Slow Down And Grow Smart, Not Fast",
"url": "/posts/slow-down.html",
"content": "<p>\n<strong>TL;DR:</strong> Sustainable business growth prioritises measured expansion over explosive scaling, focusing on leadership development, strategic capital deployment, organisational health through clear alignment, and strong communication-creating resilient companies where profitability, employee retention, and team autonomy become key metrics of success rather than breakneck speed.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nIn an industry obsessed with “move fast and break things,” some companies prove that measured growth creates lasting success. I was happy to listen to the <a href=\"https://saasscalingsecrets.buzzsprout.com/2172375/episodes/15926541-why-slower-growth-could-be-your-fast-track-to-success-with-roan-lavery-ceo-of-freeagent\">CEO of a successful company</a> recounting their success, which was largely thanks to slow and steady growth.</p>\n<h2>\nSustainable Development Over Explosive Growth</h2>\n<p>\nIt’s safe to say that consistency and growing at a controlled pace is a recipe for a successful sustainable [insert word here]. This strategy applies to most things in life, in my view. Here is what caught my attention from this discussion:</p>\n<ul>\n <li>\nThe company grew at a pace that allowed leaders to develop alongside the </li>\n <li>\nThey focused on reaching profitability within 18-24 months after each funding </li>\n <li>\nThey raised capital from a position of strength, not necessity </li>\n <li>\nCapital was then deployed strategically rather than burning through runway <br>\nbusiness round </li>\n</ul>\n<h2>\nBuilding Strong Foundations</h2>\n<p>\nOrganisational health was centred around <em>clear alignment</em> through frameworks like <a href=\"https://www.tablegroup.com/product/the-advantage/\">The Advantage</a>[^1]. Company values have been <em>embedded into daily processes</em>. Goals are <em>integrated into regular team workflows</em>. Finally, maintaining <em>strong communication across all levels</em> has been key to the company’s success.</p>\n<h2>\nCulture And Retention</h2>\n<p>\nSuccess metrics go beyond financial growth. Key employees have stayed with the company <em>long-term</em>. Teams have maintained <em>autonomy</em> while staying <em>accountable through balanced scorecards</em>. Leadership has focused on creating <em>clarity</em> and <em>empowering teams</em>. Regular reinforcement of values has been achieved through <em><a href=\"https://youtube.com/watch?v=Og7NzaVpceE\">good onboarding</a></em>[^2], and daily operations.</p>\n<h2>\nKey Takeaways</h2>\n<ul>\n <li>\n<strong>Match growth to capability</strong>: Ensure your organisation can sustainably </li>\n <li>\n<strong>Focus on fundamentals</strong>: Build strong processes and systems that scale </li>\n <li>\n<strong>Invest in people</strong>: Give teams time and resources to develop alongside the </li>\n <li>\n<strong>Deploy capital wisely</strong>: Prioritise sustainable growth over rapid cash burn <br>\nsupport its growth rate gradually company </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\n<strong>This is the TL;DR really:</strong> Building a successful tech company doesn’t require breakneck speed or unsustainable growth. Smart, measured expansion with a focus on people and processes creates resilient businesses that stand the test of time.</p>\n<hr class=\"thin\">\n<p>\n[^1]: “_[the author] makes an overwhelming case that organisational health will</p>\n<pre><code>surpass all other disciplines in business as the greatest opportunity for\nimprovement and competitive advantage._"</code></pre>\n<p>\n[^2]: Characteristics of a good onboarding program: a) One thing at a time, b)</p>\n<pre><code>Lots of practice (with feedback), c) Attain proficiency, then move forward,\nd) The learner follows rather than guides, e) "Minimum Productive\nCompetency"</code></pre>\n",
"tags": [
"slow-down",
"advantage",
"best-practices",
"company-culture",
"productivity",
"decision-making",
"business-value",
"real-value"
]
},
{
"date": "2024-11-13",
"title": "💡 TIL: vLLM Is A High-Performance Engine For LLM Serving",
"url": "/posts/TIL-vLLM.html",
"content": "<p>\n<strong>TL;DR:</strong> vLLM revolutionises LLM deployment through its PagedAttention algorithm, which applies virtual memory principles to key-value caches, enabling more efficient memory management and significantly improving throughput for resource-constrained environments whilst supporting popular open-source models.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nAs a Data Scientist / AI Engineer exploring local-first solutions[^1][^2], deploying Large Language Models (LLMs) presents significant resource management challenges. vLLM emerges as a breakthrough solution that fundamentally reimagines how we deploy and utilise these resource-intensive models[^4].</p>\n<h2>\nWhat Is vLLM?</h2>\n<p>\nvLLM is an open-source serving engine that optimises LLM deployment through virtualisation techniques[^4]. At its core, vLLM introduces PagedAttention, a novel attention algorithm that improves memory utilisation through paged memory management[^3]. Similar to how operating systems manage virtual memory, PagedAttention segments the key-value memory into non-continuous pages, enabling more efficient memory usage and request handling.</p>\n<p>\nKey features[^4]:</p>\n<ul>\n <li>\nEfficient memory management through PagedAttention </li>\n <li>\nContinuous batching for request handling </li>\n <li>\nSupport for popular open-source models (Llama, Mistral, Falcon) </li>\n</ul>\n<h2>\nImplementation</h2>\n<p>\nHere’s a simple example of using vLLM:</p>\n<pre><code class=\"python\">from vllm import LLM, SamplingParams\n\n# Initialise the model\nllm = LLM(model="meta-llama/Llama-3.1-8B")\n\n# Define sampling parameters\nsampling_params = SamplingParams(\n temperature=0.7,\n max_tokens=128\n)\n\n# Generate text\noutputs = llm.generate(["Your prompt goes here"], sampling_params)</code></pre>\n<h2>\nApplications</h2>\n<p>\nAccording to industry analysis[^4], vLLM’s applications span multiple domains:</p>\n<ol>\n <li>\n <p>\nNatural Language Processing </p>\n <ul>\n <li>\nEnhances chatbots and sentiment analysis - Improves language translation services </li>\n </ul>\n </li>\n <li>\n <p>\nHealthcare </p>\n <ul>\n <li>\nEnables secure patient data analysis - Assists in medical diagnostics </li>\n </ul>\n </li>\n <li>\n <p>\nFinancial Services </p>\n <ul>\n <li>\nPowers fraud detection systems - Enhances automated customer service </li>\n </ul>\n </li>\n <li>\n <p>\nEducation </p>\n <ul>\n <li>\nFacilitates intelligent tutoring systems - Enables automated assessment tools </li>\n </ul>\n </li>\n</ol>\n<h2>\nBest Practices for Implementation[^4]</h2>\n<p>\nFor optimal vLLM deployment:</p>\n<ul>\n <li>\nImplement model optimisation techniques </li>\n <li>\nUtilise containerisation for scalable deployment </li>\n <li>\nMaintain robust monitoring systems </li>\n <li>\nRegular performance optimisation </li>\n</ul>\n<h2>\nConclusion</h2>\n<p>\nvLLM represents a significant advancement in LLM serving technology[^3], offering an efficient, scalable solution for resource-constrained environments. Its innovative approach to memory management through PagedAttention and broad applicability across industries makes it an essential tool for modern AI development.</p>\n<hr class=\"thin\">\n<p>\n[^1]: <a href=\"https://www.puppet.com/blog/cloud-repatriation\">Cloud Repatriation: Examples, Unpacking 2024 Trends & Tips for Reverse Migration</a></p>\n<p>\n[^2]: <a href=\"https://thenewstack.io/why-companies-are-ditching-the-cloud-the-rise-of-cloud-repatriation/\">Why Companies Are Ditching the Cloud: The Rise of Cloud Repatriation</a></p>\n<p>\n[^3]: Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez,</p>\n<pre><code>J.E., Zhang, H., and Stoica, I. (2023). "Efficient Memory Management for\nLarge Language Model Serving with PagedAttention."\n[arXiv:2309.06180](https://arxiv.org/abs/2309.06180).</code></pre>\n<p>\n[^4]: <a href=\"https://aijobs.net/insights/vllm-explained/\">vLLM Explained</a></p>\n",
"tags": [
"til",
"llm",
"ai",
"python",
"on-prem",
"performance"
]
},
{
"date": "2024-11-11",
"title": "🔀 Cross-Platform Builds In Python",
"url": "/posts/py-cross-compile.html",
"content": "<p>\n<strong>TL;DR:</strong> Creating cross-platform Python application packages requires CI/CD solutions like GitHub Actions since tools like PyInstaller can’t natively build for multiple platforms; alternatives like Julia and Elixir offer promising but still-maturing packaging options, while Deno emerges as an appealing alternative with its straightforward cross-platform packaging capabilities, lightweight footprint, and growing data ecosystem- though Python remains dominant for data analysis despite its packaging limitations.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nIn the last couple of years I’ve spoken to 3-4 people who had needed a bespoke data analysis tool that could be used locally, with privacy in mind. In some cases they’d need to work in a sandboxed environment for security reasons, other times they had IP protection concerns. A desktop app or a CLI tool[^1] seemed to fit the bill in all those cases.\\ In the last decade+, Data and Python have become a synonym. <a href=\"https://pyinstaller.org/\">PyInstaller</a> seemed like an obvious choice. However, PyInstaller cannot cross-compile code as it stands. Being a <em>Linux</em> user, offering to help <em>Windows</em> users, meant I should find a workaround e.g. leveraging GitHub Actions for cross-compilation.</p>\n<h2>\nTaking The Scenic Route</h2>\n<p>\nOut of curiosity, I decided to have a poke around a couple of different languages and ecosystems that could teach me a few things while helping me understand what is a viable alternative to Python.</p>\n<h3>\nJulia</h3>\n<p>\nThe <a href=\"https://julialang.org/\">Julia</a> programming language has caught my eye since 2014. It partially reminded me of MATLAB, that I used for my PhD. Familiarity aside, it’s a great language to develop with. It’s fast, interactive, with the best REPL I’ve encountered, highly promising overall especially so after the release of Julia v1.9.\\ To my understanding Julia, being a JIT compiled language, wasn’t designed for static compilation per se. Community efforts have enabled the generation of compiled packages, with <a href=\"https://binarybuilder.org/\">BinaryBuilder</a>, <a href=\"https://julialang.github.io/PackageCompiler.jl\">PackageCompiler</a> and <a href=\"https://github.com/tshort/StaticCompiler.jl\">StaticCompiler</a> being the most well known compilation tools available at the time of writing. From an intermediate Julia user’s point of view, I’ve found that compilation results may vary. Also, to the best of my knowledge most compilation tools actually <em>package</em> code rather than statically compile it, which may expose valuable IP. Therefore, I concluded that Julia <em>probably</em> isn’t as easy to compile as I initially thought (and hoped).</p>\n<h3>\nElixir</h3>\n<p>\n<a href=\"https://elixir-lang.org/\">Elixir</a> is a fantastic hosted functional language, running on the tried and tested <a href=\"https://en.wikipedia.org/wiki/BEAM_(Erlang_virtual_machine)\">BEAM (Erlang VM)</a>. One of the many things Elixir has going for it, is its strong drive towards <em>good</em> documentation. The language’s REPL is also excellent. All in all, Elixir is rapidly evolving and it’s worth experimenting with.\\ Starting from 2021, <a href=\"https://github.com/elixir-nx\">Numerical Elixir (Nx)</a> has progressed by leaps and bounds. The Nx community has managed to produce excellent libraries, with <a href=\"https://livebook.dev/\">Livebook</a> being the best literate programming environment I’ve ever used. As far as data applications are concerned, Elixir will become a <em>very strong</em> contender, it’s well worth keeping a close eye on the language.\\ As for cross-compilation, to my understanding <a href=\"https://hex.pm/packages/burrito\">Burrito</a> is the only tool that allows for packaged Elixir code to be truly portable albeit producing sizeable executables. Burrito is still WIP, not a guaranteed solution for the time being but a noteworthy tool that’s improving fast.\\ Being doubtful as to whether this tech stack could meet all my current needs, beside being a niche language in Data and AI, led me to search for another tech stack for fun and profit.</p>\n<h3>\nDeno (TypeScript)</h3>\n",
"tags": [
"python",
"github-actions",
"ci-cd",
"cross-platform",
"deno",
"typescript"
]
},
{
"date": "2024-10-30",
"title": "💡 TIL: 1.58-bit LLMs Match Full Performance @ 98.6% Energy Reduction",
"url": "/posts/TIL-1bitLLM.html",
"content": "<p>\n<strong>TL;DR:</strong> Ternary-weighted LLMs using only {-1, 0, 1} values (1.58 bits) can match full-precision performance while delivering dramatic efficiency improvements: 2.71x faster inference, 2.55x lower memory usage, and 71.4x lower energy consumption for matrix operations at 3B parameter scale.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nBack in February 2024, a preprint titled <a href=\"https://arxiv.org/abs/2402.17764\">The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits</a> was released. Lots of people picked up on it, simply search for <em>1.58-bits</em> on YouTube for instance, however it escaped me due to <a href=\"https://xcancel.com/AdamMGrant/status/1851348990589354464\">a busy time at work</a>. It was only when I stumbled across this preprint again recently, that I realised what a fantastic idea it is to <a href=\"https://www.youtube.com/watch?v=wCDGiys-nLA\">substitute multiplication with addition or subtraction</a>.</p>\n<h2>\nContributions</h2>\n<p>\nThe TL;DR is that all LLM weights can be ternary i.e. {-1, 0, 1}. Ternary weights are 1.58-bits. Activations are 8-bits. This highly quantised model matches full-precision performance at 3B parameter scale.\\ This highly quantised model exhibits 2.71x faster inference, 2.55x lower memory usage at 3B scale, 71.4x lower energy consumption for matrix multiplication operations. Benefits increase with model scale e.g. 4.1x speed-up at 70B parameters, 8.9x higher throughput, 11x larger batch size.</p>\n<h2>\nWhat Does 1.58-bits Mean?</h2>\n<p>\nCarnegie Mellon University has a great reference on the <a href=\"https://www.cs.cmu.edu/~dst/Tutorials/Info-Theory/\">basics of Information theory</a>. Learning how to measure information content for a ternary system {-1, 0, 1}, we notice that:\\ Each value {-1, 0, 1} has an equal probability $P = \\frac{1}{3}$ for each state. The information content is $-(P \\log_2{P})$ summed over all states</p>\n<p>\n$$ -(\\frac{1}{3} \\log_2(\\frac{1}{3}) + \\frac{1}{3} \\log_2(\\frac{1}{3}) + \\frac{1}{3} \\log_2(\\frac{1}{3})) = -(3 × (\\frac{1}{3} \\log_2(\\frac{1}{3}))) = -\\log_2(\\frac{1}{3}) \\approx 1.58496… bits $$</p>\n<h2>\nConclusion</h2>\n<p>\nLLMs can achieve comparable performance to full-precision models while using only three weight values {-1, 0, 1}, achieving up to 71.4x lower energy consumption for matrix operations and 3.55x lower memory usage at 3B scale. This breakthrough suggests a new direction for efficient LLM deployment, particularly promising for edge devices and mobile applications, while also opening opportunities for specialized hardware optimized for 1-bit operations.</p>\n",
"tags": [
"til",
"llm",
"performance",
"energy-reduction"
]
},
{
"date": "2024-10-29",
"title": "🗃️ RAG Is Here To Stay",
"url": "/posts/rag-is-here-to-stay.html",
"content": "<p>\n<strong>TL;DR:</strong> Despite larger LLM context windows, Retrieval-Augmented Generation (RAG) remains essential for information curation, data provenance, and overcoming the “lost in the middle” effect where models struggle with information placed centrally in long contexts-making careful retrieval strategies more valuable than simply dumping large amounts of text into expanded context windows.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThis morning I noticed that <a href=\"https://xcancel.com/simonw/status/1850928417363149049\">Simon Willison shared some views on RAG</a>, <a href=\"https://xcancel.com/burkov/status/1851159933913280647\">Andryi Burkov criticised</a> people who claim that RAG is obsolete, and other RAG-related discussions taking place sparked by recent longer LLM context windows. Below I’m sharing some thoughts based on personal experience.</p>\n<h2>\nRAG</h2>\n<p>\nRAG is not simply a workaround to context limits, it’s a way to carefully curate information and data. It enables provenance and visibility of the data flowing through an LLM pipeline -compared to fine-tuning which bakes knowledge into the model itself. Importantly, RAG is not a synonym of embeddings. Embedding text is a fantastic way to enable semantic search, especially if it is done in a smart way (word, sentence, paragraph, or document) given project needs.\\ I have successfully reused existing infrastructure to provide one of the largest companies in the world with the ability to quickly retrieve information through Q & A. To achieve this, in the context of simplicity and leveraging existing infrastructure, I opted against adding moving parts like a Vector DB. Instead, I used plain JSON objects and an agentic system to meet the client’s needs. It worked very well, with feedback from higher management being “<em><strong>thank you</strong>, this is mind-blowing</em>“.\\ A nice overview of RAG comes from <a href=\"https://www.latent.space/p/llamaindex\">Jerry Liu’s interview on Latent Space</a>.\\ <em>Update: a useful open-source tool for <a href=\"https://github.com/Brandon-c-tech/RAG-logger\">RAGLogging</a> just came out.</em></p>\n<h2>\nU-Shaped Performance</h2>\n<p>\nOne LLM behaviour that should be considered, before regarding RAG obsolete, is their tendency to attend to information from the beginning and end of the context window. See <a href=\"https://arxiv.org/abs/2307.03172\">Lost in the Middle: How Language Models Use Long Contexts</a> for an empirical analysis.\\ The paper concludes</p>\n<blockquote>\n <p>\nWe empirically study how language models use long input contexts via a series > of controlled experiments. We show that language model performance degrades > significantly when changing the position of relevant information, indicating > that models struggle to robustly access and use information in long input > contexts. In particular, performance is often lowest when models must use > information in the middle of long input contexts.We conduct a preliminary > investigation of the role of (i) model architecture, (ii) query-aware > contextualisation, and (iii) instruction fine-tuning to better understand how > they affect how language models use context. Finally, we conclude with a > practical case study of open-domain question answering,finding that the > performance of language model readers saturates far before retriever recall. > Our results and analysis provide a better understanding of how language models > use their input context and provides new evaluation protocols for future > long-context models.\\ > In other words, simply dumping loads of text or embeddings into an LLM with a > big context window -say 2M tokens- won’t yield great results. There’s more to > it than brute forcing. </p>\n</blockquote>\n<h2>\nConclusion</h2>\n<p>\nExtending context length, as appealing as it may sound, neither simplifies nor solves the issue of creating a good quality AI system that is enriched by large text corpora. It seems that when it comes to larger data volumes, <a href=\"https://www.youtube.com/watch?v=5e1Wzbr8wGU\">semantic search augmented with Graph search</a> could be a more robust, albeit more involved, approach. Solid prompt engineering approaches, including <a href=\"https://www.promptingguide.ai/techniques/cot\">Chain-of-Thought</a>, <a href=\"https://www.promptingguide.ai/techniques/fewshot\">Few-shot prompting</a> etc. are also powerful tools to keep in our toolbox.</p>\n",
"tags": [
"rag",
"llm",
"ai",
"performance"
]
},
{
"date": "2024-10-25",
"title": "💡 TIL: Useful Nuggets from AI Engineers",
"url": "/posts/TIL-useful-AI-nuggets.html",
"content": "<p>\n<strong>TL;DR:</strong> Leading AI researchers emphasise that success in the field now depends more on engineering skills and adaptability than academic credentials, with the most valuable skills being prioritisation, communication, project selection, and willingness to manually inspect data-suggesting that choosing the right problem can multiply output by 10-100x more effectively than simply working longer hours.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nAs I was going through some <a href=\"https://web.stanford.edu/class/cs25/\">CS25: Transformers United V4</a> lectures from Stanford, I stumbled across some pertinent and useful quotes from guest lecturers.</p>\n<h2>\n📖</h2>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1631017964286922753?cxt=HHwWgsDS-c2MxaItAAAA\">Best AI skillset</a></h3>\n<blockquote>\n <p>\nBest AI skillset in 2018: PhD + long publication record in a specific area\\ > Best AI skillset in 2023: strong engineering abilities + adapting quickly to > new directions without sunk cost fallacy </p>\n</blockquote>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1514327894746574851\">Advice on choosing a topic</a></h3>\n<blockquote>\n <p>\n[…] the project you choose defines the upper-bound for your success. </p>\n</blockquote>\n<h3>\n<a href=\"https://docs.google.com/presentation/d/1u05yQQaw4QXLVYGLI6o3YoFHv6eC3YN8GvWD8JMumpE\">Study the change itself</a></h3>\n<blockquote>\n <p>\nAI is advancing so fast it is hard to keep up. People spend a lot of time and > energy catching up with the latest developments. But not enough attention goes > to the old things. It is more important to <em>study the change itself</em> </p>\n</blockquote>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1699860824053911558\">Why I’m 100% transparent with my manager</a></h3>\n<blockquote>\n <p>\nI try to open this [performance] conversation [with my line manager] by asking > “what can I do better”.\\ > I tend to use my 1-1s to talk about bigger picture stuff. […] since that’s > where managers can help the most. Of course, all this [honesty with your > manager] going well is conditional on working in a healthy company and having > a decent manager. […] do you want to keep working for someone who doesn’t > ask for feedback, or who doesn’t take your problems seriously? </p>\n</blockquote>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1689346627428036608\">My strengths are communication and prioritization</a></h3>\n<blockquote>\n <p>\n[…] a friend recently asked me what were the best skills I had. […] I said > prioritization and communication. These skills are relatively general but > happen to be very important for AI research. </p>\n</blockquote>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1701665241652945283\">Many great managers do IC (Individual Contributor) work</a></h3>\n<blockquote>\n <p>\nIt seems to be not a coincidence that some of the strongest leaders in AI who > manage large teams frequently do very low-level technical work. </p>\n</blockquote>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1708921475829481683\">Manually inspect data</a></h3>\n<blockquote>\n <p>\n[…] great AI researchers are willing to manually inspect lots of data. And > more than that, they build infrastructure that allows them to manually inspect > data quickly. Though not glamorous, manually examining data gives valuable > intuitions about the problem. </p>\n</blockquote>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1731780538405716078\">Read informal write-ups</a></h3>\n<blockquote>\n <p>\n[…] I like to look at the process of how they [great researchers] got there. </p>\n</blockquote>\n<h3>\n<a href=\"https://xcancel.com/_jasonwei/status/1766692847078756557\">Advice from Bryan Johnson</a></h3>\n<blockquote>\n <p>\n[…] having good health enables clear thinking, which is by far the biggest > leverage in AI. […] While it’s possible to double output by working twice as > many hours, choosing a better project has the potential to 10x or even 100x > output. </p>\n</blockquote>\n",
"tags": [
"til",
"ai",
"llm"
]
},
{
"date": "2024-10-08",
"title": "🔁 GitHub Actions for yt-dlp-hq",
"url": "/posts/gh-actions-ytdlphq.html",
"content": "<p>\n<strong>TL;DR:</strong> This article details the development of a Deno-based tool for downloading high-quality videos with audio using yt-dlp, highlighting unexpected challenges with GitHub Actions where compiled executables became unusable after release-ultimately solved by compressing executables into .tar files, preserving functionality whilst revealing potential limitations in GitHub’s release mechanisms.</p>\n<!--more-->\n<p>\n<strong><a href=\"https://xkcd.com/1205/\">Was it worth my time</a>?</strong></p>\n<h2>\nIntroduction</h2>\n<p>\nThere are times where I need to use my computer offline, e.g. when I’m travelling. Having to stay offline is a good opportunity for me to study some lectures of interest, without distractions. For that, I need offline access to the videos I’m interested in.\\ <a href=\"https://github.com/yt-dlp/yt-dlp\">yt-dlp</a> is a great open-source project that allows the user to download audio and/or video from a wide array of platforms including YouTube. Recently, I noticed that it’s no longer as straightforward to download a video with audio, using <code class=\"inline\">yt-dlp</code>. One workaround is to download the audio and video streams separately, and merge them using <a href=\"https://ffmpeg.org/\">FFmpeg</a>. This was a good opportunity to write a small automation project in a language I’m interested in.</p>\n<h3>\n<code class=\"inline\">yt-dlp</code> And YouTube</h3>\n<p>\nHere’s an example that motivates implementing this project. Imagine I’d like to download a video from the excellent <a href=\"https://www.youtube.com/channel/UCKWaEZ-_VweaEx1j62do_vQ\">IBM Technology</a> YouTube channel, for instance <a href=\"https://www.youtube.com/watch?v=F8NKVhkZZWI\">What are AI Agents</a>. Listing the video’s available formats, returns the following table</p>\n<pre><code class=\"console\">$ yt-dlp -F https://www.youtube.com/watch\\?v\\=F8NKVhkZZWI\n[youtube] Extracting URL: https://www.youtube.com/watch?v=F8NKVhkZZWI\n[youtube] F8NKVhkZZWI: Downloading webpage\n[youtube] F8NKVhkZZWI: Downloading ios player API JSON\n[youtube] F8NKVhkZZWI: Downloading web creator player API JSON\n[youtube] F8NKVhkZZWI: Downloading m3u8 information\n[info] Available formats for F8NKVhkZZWI:\nID EXT RESOLUTION FPS CH │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC ABR ASR MORE INFO\n─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────\nsb3 mhtml 48x27 0 │ mhtml │ images storyboard\nsb2 mhtml 80x45 0 │ mhtml │ images storyboard\nsb1 mhtml 160x90 0 │ mhtml │ images storyboard\nsb0 mhtml 320x180 0 │ mhtml │ images storyboard\n233 mp4 audio only │ m3u8 │ audio only unknown [en] Default\n234 mp4 audio only │ m3u8 │ audio only unknown [en] Default\n139-drc m4a audio only 2 │ 4.35MiB 49k https │ audio only mp4a.40.5 49k 22k [en] low, DRC, m4a_dash\n139 m4a audio only 2 │ 4.35MiB 49k https │ audio only mp4a.40.5 49k 22k [en] low, m4a_dash\n249 webm audio only 2 │ 4.37MiB 49k https │ audio only opus 49k 48k [en] low, webm_dash\n250 webm audio only 2 │ 5.27MiB 59k https │ audio only opus 59k 48k [en] low, webm_dash\n140-drc m4a audio only 2 │ 11.55MiB 129k https │ audio only mp4a.40.2 129k 44k [en] medium, DRC, m4a_dash\n140 m4a audio only 2 │ 11.55MiB 129k https │ audio only mp4a.40.2 129k 44k [en] medium, m4a_dash\n251 webm audio only 2 │ 9.45MiB 106k https │ audio only opus 106k 48k [en] medium, webm_dash\n602 mp4 256x144 15 │ ~ 7.26MiB 81k m3u8 │ vp09.00.10.08 81k video only\n394 mp4 256x144 30 │ 4.36MiB 49k https │ av01.0.00M.08 49k video only 144p, mp4_dash\n269 mp4 256x144 30 │ ~ 11.26MiB 126k m3u8 │ avc1.4D400C 126k video only\n160 mp4 256x144 30 │ 2.97MiB 33k https │ avc1.4D400C 33k video only 144p, mp4_dash\n603 mp4 256x144 30 │ ~ 13.63MiB 153k m3u8 │ vp09.00.11.08 153k video only\n278 webm 256x144 30 │ 7.23MiB 81k https │ vp09.00.11.08 81k video only 144p, webm_dash\n395 mp4 426x240 30 │ 5.90MiB 66k https │ av01.0.00M.08 66k video only 240p, mp4_dash\n229 mp4 426x240 30 │ ~ 14.99MiB 168k m3u8 │ avc1.4D4015 168k video only\n133 mp4 426x240 30 │ 4.49MiB 50k https │ avc1.4D4015 50k video only 240p, mp4_dash\n604 mp4 426x240 30 │ ~ 21.46MiB 241k m3u8 │ vp09.00.20.08 241k video only\n242 webm 426x240 30 │ 7.38MiB 83k https │ vp09.00.20.08 83k video only 240p, webm_dash\n396 mp4 640x360 30 │ 10.27MiB 115k https │ av01.0.01M.08 115k video only 360p, mp4_dash\n230 mp4 640x360 30 │ ~ 29.86MiB 335k m3u8 │ avc1.4D401E 335k video only\n134 mp4 640x360 30 │ 7.76MiB 87k https │ avc1.4D401E 87k video only 360p, mp4_dash\n18 mp4 640x360 30 2 │ 32.38MiB 363k https │ avc1.42001E mp4a.40.2 44k [en] 360p\n605 mp4 640x360 30 │ ~ 39.56MiB 444k m3u8 │ vp09.00.21.08 444k video only\n243 webm 640x360 30 │ 12.52MiB 140k https │ vp09.00.21.08 140k video only 360p, webm_dash\n397 mp4 854x480 30 │ 17.06MiB 191k https │ av01.0.04M.08 191k video only 480p, mp4_dash\n231 mp4 854x480 30 │ ~ 37.90MiB 425k m3u8 │ avc1.4D401F 425k video only\n135 mp4 854x480 30 │ 11.28MiB 126k https │ avc1.4D401F 126k video only 480p, mp4_dash\n606 mp4 854x480 30 │ ~ 50.26MiB 564k m3u8 │ vp09.00.30.08 564k video only\n244 webm 854x480 30 │ 17.41MiB 195k https │ vp09.00.30.08 195k video only 480p, webm_dash\n398 mp4 1280x720 30 │ 31.31MiB 351k https │ av01.0.05M.08 351k video only 720p, mp4_dash\n232 mp4 1280x720 30 │ ~ 57.85MiB 649k m3u8 │ avc1.4D401F 649k video only\n136 mp4 1280x720 30 │ 20.16MiB 226k https │ avc1.4D401F 226k video only 720p, mp4_dash\n609 mp4 1280x720 30 │ ~ 72.26MiB 810k m3u8 │ vp09.00.31.08 810k video only\n247 webm 1280x720 30 │ 27.69MiB 310k https │ vp09.00.31.08 310k video only 720p, webm_dash\n399 mp4 1920x1080 30 │ 63.06MiB 707k https │ av01.0.08M.08 707k video only 1080p, mp4_dash\n270 mp4 1920x1080 30 │ ~193.27MiB 2167k m3u8 │ avc1.640028 2167k video only\n137 mp4 1920x1080 30 │ 96.17MiB 1078k https │ avc1.640028 1078k video only 1080p, mp4_dash\n614 mp4 1920x1080 30 │ ~164.68MiB 1847k m3u8 │ vp09.00.40.08 1847k video only\n248 webm 1920x1080 30 │ 91.43MiB 1025k https │ vp09.00.40.08 1025k video only 1080p, webm_dash\n616 mp4 1920x1080 30 │ ~322.53MiB 3617k m3u8 │ vp09.00.40.08 3617k video only Premium</code></pre>\n<p>\nIt looks like the only ID containing a video <em>with</em> audio, is <code class=\"inline\">18</code>, i.e. a 420p, 640x360 video according to my media player. This might be sufficient for a video like the above, but such low resolution would make it almost impossible to read code or smaller writing.</p>\n<h2>\nMy Solution</h2>\n<p>\nGiven I have started leveraging Deno for my needs, I wrote a small tool called <a href=\"https://github.com/ai-mindset/yt-dlp-hq\">yt-dlp-hq</a>. It’s certainly basic, with lots of room for improvement. However it does exactly what I need and I’m relatively happy with the result, pending some improvements[^1].\\ Deno is great for cross-compilation. Also, GitHub Actions can be a good method for automating testing, running, compiling etc.[^2] Is it though? Let’s see.</p>\n<h2>\nMy CI Pipeline</h2>\n<p>\nI started off by using <a href=\"https://nektosact.com/introduction.html\">act</a>, a very nice tool that allows for testing pipelines locally. The main downside I found was that for an intermediate Docker user with little <code class=\"inline\">act</code> experience, sometimes GitHub Actions don’t behave the same way locally as they would online. Also, I like <a href=\"https://podman.io/\">podman</a> considerably better, since it’s daemonless and not as resource-hungry among others.\\ Putting <code class=\"inline\">act</code> aside, I focused on setting up a <a href=\"https://github.com/ai-mindset/yt-dlp-hq/blob/main/.github/workflows/ci.yml\">pipeline</a> that’d work well enough with every new PR opened against <code class=\"inline\">main</code> aside from others.\\ The pipeline ran successfully, where in theory it built and released <code class=\"inline\">yt-dlp-hq</code> executables. However, when I downloaded the corresponding executable for my OS and CPU architecture, it did not run. When I locally built the same set of executables, running <code class=\"inline\">deno task build</code>, the executable for my OS & arch worked as expected. This made me wonder whether I’m doing something wrong, if it’s a GitHub Action intricacy or some other issue I needed to resolve. Inspired by Medicine, I tried approaching the issue through differential diagnosis, which to my understanding works by excluding other causes in order to hone in on the actual medical condition. I.e. I first created a release directory locally. I then manually created a release on GitHub. To my dismay, the executable I manually uploaded didn’t run when I downloaded it back from GitHub. This made me wonder if there is a conversion involved when a pipeline generates executables or the user uploads them manually for release. Spoiler alert: I still don’t know if that’s the case, but I suspect that GitHub indeed doesn’t save executables without some change taking place during upload.</p>\n<h3>\nFixing Executables GitHub Release</h3>\n<p>\nInitially, I changed the following setting on my repository:\\ <em>“Settings -> Actions -> General -> Workflow permissions”</em> select <em>“Read and write permissions”</em>. Then, I experimented with compressing each generated executable into a .tar file. This did the trick. Simply compressing an executable is enough to maintain its function. Thus, the way to install <code class=\"inline\">yt-dlp-hq</code> takes one extra step.\\ For example, if you’re a Linux user on an Intel-based machine, here’s how you can use my tool</p>\n<pre><code class=\"console\">$ curl -L -O https://github.com/ai-mindset/yt-dlp-hq/releases/download/1.0.0/yt-dlp-hq-intel-linux.tar && tar xvf yt-dlp-hq-intel-linux.tar && cd release\n$ ./yt-dlp-hq-intel-linux https://www.youtube.com/watch?v=dQw4w9WgXcQ</code></pre>\n<h2>\nConclusion</h2>\n<p>\nI’m glad I learned something more about GitHub and Actions, its idiosyncrasies and abilities. It took me a couple days, which made me consider the benefits of <a href=\"https://xkcd.com/1319/\">automation</a>. Being more minimalistic, I tend to opt for simple automation when possible <a href=\"https://xkcd.com/1205/\"><em>if</em> it’s worth it</a>. To quote <a href=\"https://en.wikiquote.org/wiki/Alan_Perlis\">Alan Perlis</a>, “<em>Simplicity does not precede complexity, but follows it</em>“.</p>\n<hr class=\"thin\">\n<p>\n[^1]: Some improvements I’m planning include unit testing, automatic audio &</p>\n<pre><code>video ID selection and possibly automatic FFmpeg installation when it's not\navailable in `$PATH`.</code></pre>\n<p>\n[^2]: A <a href=\"https://julialang.org/\">Juiia</a> enthusiast introduced me to</p>\n<pre><code>[Woodpecker CI](https://woodpecker-ci.org/) and\n[Codeberg](https://codeberg.org/). I'm definitely considering switching,\nfollowing my recent GitHub Actions experience 🤔</code></pre>\n",
"tags": [
"github-actions",
"ci-cd",
"yt-dlp",
"deno",
"typescript",
"cross-platform"
]
},
{
"date": "2024-09-06",
"title": "📖 Python To TypeScript Cheatsheet",
"url": "/posts/python-typescript-cheatsheet.html",
"content": "<p>\n<strong>TL;DR:</strong> This compact reference guide provides side-by-side comparisons of Python and TypeScript syntax for common programming constructs including variable declarations, functions, classes, control flow structures, and error handling-serving as a quick reference for Python developers exploring TypeScript within the context of Deno development.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nI’ve been curious as to how Python and TypeScript compare at a high level, for someone new to TypeScript. Below is a chatsheet I put together with the help of Claude 3.5 Sonnet. It covers basic syntax, it’s by no means complete or exhaustive. However it gives a first taste of the similarities and differences between the two languages. The reason I am looking into TypeScript is explained in my <a href=\"../deno/\">Deno article</a>.</p>\n<h2>\nVariables And Data Types</h2>\n<p>\n| Concept | Python | TypeScript | | ——- | ————————————– | ———————————————————————— | | Int | <code class=\"inline\">x = 5</code> | <code class=\"inline\">let x: number = 5;</code> | | Float | <code class=\"inline\">y = 3.14</code> | <code class=\"inline\">let y: number = 3.14;</code> | | Str | <code class=\"inline\">name = "John"</code> | <code class=\"inline\">let name: string = "John";</code> | | Bool | <code class=\"inline\">is_valid = True</code> | <code class=\"inline\">let isValid: boolean = true;</code> | | List | <code class=\"inline\">numbers = [1, 2, 3]</code> | <code class=\"inline\">let numbers: number[] = [1, 2, 3];</code> | | Dict | <code class=\"inline\">person = {"name": "John", "age": 30}</code> | <code class=\"inline\">let person: { name: string; age: number } = { name: "John", age: 30 };</code> |</p>\n<h2>\nFunctions</h2>\n<p>\n| Concept | Python | TypeScript | | ———– | —————————— | —————————————- | | Func def | <code class=\"inline\">def greet(name: str) -> str:</code> | <code class=\"inline\">function greet(name: string): string {</code> | | Func return | <code class=\"inline\">return f"Hello, {name}!"</code> | <code class=\"inline\">return `Hello, ${name}!`;</code> | | Lambda | <code class=\"inline\">lambda x: x * 2</code> | <code class=\"inline\">(x: number): number => x * 2</code> |</p>\n<h2>\nClasses</h2>\n<p>\n| Concept | Python | TypeScript | | ————- | —————————————— | —————————————— | | Class def | <code class=\"inline\">class Person:</code> | <code class=\"inline\">class Person {</code> | | Constructor | <code class=\"inline\">def __init__(self, name: str, age: int):</code> | <code class=\"inline\">constructor(name: string, age: number) {</code> | | Instance vars | <code class=\"inline\">self.name = name</code> | <code class=\"inline\">this.name = name;</code> | | | <code class=\"inline\">self.age = age</code> | <code class=\"inline\">this.age = age;</code> | | Method def | <code class=\"inline\">def greet(self) -> str:</code> | <code class=\"inline\">greet(): string {</code> | | Method return | <code class=\"inline\">return f"Hello, I'm {self.name}!"</code> | <code class=\"inline\">return `Hello, I'm ${this.name}!`;</code> |</p>\n<h2>\nControl Flow</h2>\n<p>\n| Concept | Python | TypeScript | | ———- | ——————– | ——————————- | | If | <code class=\"inline\">if x > 0:</code> | <code class=\"inline\">if (x > 0) {</code> | | Else if | <code class=\"inline\">elif x < 0:</code> | <code class=\"inline\">} else if (x < 0) {</code> | | Else | <code class=\"inline\">else:</code> | <code class=\"inline\">} else {</code> | | For loop | <code class=\"inline\">for i in range(5):</code> | <code class=\"inline\">for (let i = 0; i < 5; i++) {</code> | | While loop | <code class=\"inline\">while x > 0:</code> | <code class=\"inline\">while (x > 0) {</code> |</p>\n<h2>\nError Handling</h2>\n<p>\n| Concept | Python | TypeScript | | ———— | ————————— | ——————- | | Try | <code class=\"inline\">try:</code> | <code class=\"inline\">try {</code> | | Except/Catch | <code class=\"inline\">except ZeroDivisionError:</code> | <code class=\"inline\">} catch (error) {</code> | | Finally | <code class=\"inline\">finally:</code> | <code class=\"inline\">} finally {</code> |</p>\n",
"tags": [
"python",
"typescript",
"cheatsheet"
]
},
{
"date": "2024-09-05",
"title": "🏗️ Modern Data Science and AI Engineering with Deno 2.0",
"url": "/posts/deno.html",
"content": "<p>\n<strong>TL;DR:</strong> Deno 2.0 offers a compelling alternative to Python for AI and data science workflows by providing zero-configuration TypeScript support, native security features, cross-compilation capabilities, and an ecosystem of essential tools-addressing Python’s environment management complexities and deployment frictions whilst enabling production-ready development from proof-of-concept through to single-binary distribution.</p>\n<!--more-->\n<h2>\nIntroduction</h2>\n<p>\nThe landscape of Data Science and AI engineering is at a critical inflection point. While Python has dominated data science and machine learning, its fragmented ecosystem and deployment complexities increasingly impede production systems. I’ve already touched on <a href=\"{{ site.baseurl }}{% link _posts/2024-11-21-bring-it-back-to-basics.md %}\">my solution to Python’s fragmentation</a>.\\ Deno 2.0 emerges as a compelling solution to these challenges, bringing together a number of technologies that cover most computing requirements across a very wide range of domains. Having said that, JavaScript (JS) and its superset TypeScript (TS) are <a href=\"https://www.youtube.com/watch?v=aXOChLn5ZdQ\">far from perfect languages</a>[^1], but this discussion is outside the scope of this blog post.</p>\n<p>\nThe key factors driving change are:</p>\n<ul>\n <li>\nPython environment management complexity </li>\n <li>\nProduction security requirements </li>\n <li>\nDeployment workflow friction </li>\n <li>\nNeed for type safety in large-scale AI applications </li>\n</ul>\n<h2>\nThe Deno Advantage and Ecosystem</h2>\n<h3>\nCore Capabilities</h3>\n<p>\nDeno 2.0 provides a comprehensive, zero-configuration solution with:</p>\n<ul>\n <li>\nNative TS support </li>\n <li>\nFirst-class security features </li>\n <li>\nCross-compilation through <code class=\"inline\">deno compile</code> </li>\n <li>\nBuilt-in development tools </li>\n</ul>\n<p>\nAs Ryan Dahl emphasized in a recent <a href=\"https://www.youtube.com/watch?v=tZBCq8Ijkgw\">Syntax podcast episode</a>: “<em>Deno works really great as a single file. It’s really great for scripting, […] you can just put some imports in and start working from a single file. And that is actually exactly what you want from notebooks</em>“. This aligns with recent work by <a href=\"https://www.answer.ai/\">Answer.AI</a>‘s <a href=\"https://www.alexisgallagher.com/\">Alexis Gallagher</a> on <a href=\"https://youtube.com/watch?v=t6-Uup-Alfs\">single-script Python development</a>.</p>\n<h3>\nAI and Data Processing Tools</h3>\n<p>\nThe ecosystem provides direct parallels to Python’s essential tools:</p>\n<p>\nData Processing:</p>\n<ul>\n <li>\n<a href=\"https://pola-rs.github.io/nodejs-polars/\">nodejs-polars</a> for high-performance </li>\n <li>\n<a href=\"https://observablehq.com/plot/\">Observable Plot</a> for modern visualisation <br>\nDataFrame operations </li>\n</ul>\n<p>\nMachine Learning:</p>\n<ul>\n <li>\n<a href=\"https://js.langchain.com/\">LangChain.js</a> and </li>\n <li>\n<a href=\"https://huggingface.co/docs/transformers.js/index\">Transformers.js</a> for </li>\n <li>\n<a href=\"https://www.tensorflow.org/js\">TensorFlow.js</a> and <br>\n<a href=\"https://ts.llamaindex.ai/\">LlamaIndex.ts</a> for LLM applications Hugging Face integration <a href=\"https://github.com/nuanio/xgboost-node\">XGBoost-node</a> for ML tasks </li>\n</ul>\n<p>\nInfrastructure:</p>\n<ul>\n <li>\nNative </li>\n <li>\n<a href=\"https://github.com/qdrant/qdrant-js\">Qdrant JS</a> for vector storage </li>\n <li>\n<a href=\"https://stdlib.io/docs/api/latest\">STDLib</a> for extended functionality </li>\n <li>\n<a href=\"https://github.com/axa-group/nlp.js/\">NLP.js</a> and <br>\n<a href=\"https://docs.deno.com/runtime/manual/basics/connecting_to_databases/#postgres\">Postgres</a> and <a href=\"https://docs.deno.com/runtime/manual/basics/connecting_to_databases/#mongodb\">MongoDB</a> support <a href=\"https://github.com/spencermountain/compromise\">compromise</a> for NLP </li>\n</ul>\n<h2>\nA Pragmatic Decision Framework</h2>\n<p>\nModern AI systems need to balance rapid experimentation with production-ready stability. Through experimentation, I’ve found that a TS-based approach using Deno provides an elegant solution to both needs.</p>\n<h3>\nProduction-First Design</h3>\n<p>\nFor a start, a zero-configuration Deno-based environment makes it easy to produce code spanning proof of concept (POC) to production. This gives the user native security, cross-compilation capabilities and simple single-binary distribution, eliminating many traditional deployment headaches. While Python remains popular for Data Science and AI research, Deno with simple TS[^2] has been able to handle most of my computational equally well, in a lightweight and productive way.</p>\n<h3>\nPractical Implementation Guide</h3>\n<p>\nTransitioning to this kind of all-in-one Deno-driven architecture can start by utilising tools like LangChain.js or LlamaIndex.ts for LLM applications. Data processing can be handled efficiently through nodejs-polars, while Observable Plots provides powerful visualisation.\\ Emphasising simplicity, we can use REST/GraphQL to handle service communication, with shared data stores and container-based deployment maintaining clear service boundaries. This approach supports both monolithic and microservice architectures, based on project needs.</p>\n<h3>\nDevelopment Best Practices</h3>\n<p>\n<a href=\"{{ site.baseurl }}{% link _posts/2024-11-22-iterative-refinement.md %}\">Iterative refinement development</a> remains an equally productive approach. Strong typing helps with development and code robustness, while correctly used async/await patterns ensure system responsiveness. This approach enables rapid prototyping without sacrificing production readiness.</p>\n<h2>\nConclusion</h2>\n<p>\nLeveraging Deno with TS as a replacement for Python is a possible, viable and usually more lightweight alternative for developing more maintainable, secure and production-ready Data and AI systems. Deno’s zero-config setup, extensive tooling, security focus and stability address key pain points I have encountered in my Python development journey.\\ The Deno ecosystem has reached maturity, making it a viable and often superior alternative -in my experience- to traditional Python-based approaches for modern AI engineering workflows.</p>\n<hr class=\"thin\">\n<p>\n[^1]: There are noteworthy -sadly not as widely used- languages such as</p>\n<pre><code>[Clojure](https://clojure.org/) and [Racket](https://racket-lang.org/),\nbacked by\n[computer science research](https://en.wikipedia.org/wiki/Lisp_(programming_language)),\nthat pioneered concepts like iterative refinement (aka REPL-driven\ndevelopment) among others.</code></pre>\n<p>\n[^2]: “simple” in this context refers to leveraging types but avoiding more</p>\n<pre><code>involved TypeScript ideas.</code></pre>\n",
"tags": [
"ai",
"llm",
"cross-platform",
"data-processing",
"data-science",
"deno",
"machine-learning",
"minimal",
"polars",
"production",
"deployment",
"toolchain",
"typescript",
"security",
"zero-config"
]
}
]