Detect which human language a document uses from OCaml, from the Nu Html validator
languages unicode ocaml

Add langdetect-js package for browser language detection

New langdetect-js package provides:
- JavaScript API via js_of_ocaml (langdetect.detect, detectAll, etc.)
- Browser-based regression tests across 37 languages
- Interactive demo page (test.html)

Build with: opam exec -- dune build lib/js/
Open test.html in browser to run tests and try the demo.

API:
langdetect.detect(text) → lang code or null
langdetect.detectWithProb(text) → {lang, prob} or null
langdetect.detectAll(text) → [{lang, prob}, ...]
langdetect.languages() → array of supported language codes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+764 -2
+15 -1
dune-project
··· 16 16 (synopsis "Language detection library using n-gram frequency analysis") 17 17 (description 18 18 "An OCaml port of the Cybozu langdetect algorithm. Detects the natural \ 19 - language of text using n-gram frequency profiles. Supports 49 languages \ 19 + language of text using n-gram frequency profiles. Supports 47 languages \ 20 20 including English, Chinese, Japanese, Arabic, and many European languages.") 21 21 (depends 22 22 (ocaml (>= 5.1.0)) 23 23 (uutf (>= 1.0.0)) 24 24 (odoc :with-doc) 25 25 (alcotest (and :with-test (>= 1.7.0))))) 26 + 27 + (package 28 + (name langdetect-js) 29 + (synopsis "Language detection for browsers via js_of_ocaml/wasm_of_ocaml") 30 + (description 31 + "Browser-compatible language detection compiled with js_of_ocaml. \ 32 + Provides a JavaScript API for detecting languages in web applications. \ 33 + Supports both JS and WebAssembly output for optimal performance.") 34 + (depends 35 + (ocaml (>= 5.1.0)) 36 + (langdetect (= :version)) 37 + (brr (>= 0.0.6)) 38 + (js_of_ocaml (>= 5.0.0)) 39 + (js_of_ocaml-compiler (>= 5.0.0))))
+34
langdetect-js.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + synopsis: "Language detection for browsers via js_of_ocaml/wasm_of_ocaml" 4 + description: 5 + "Browser-compatible language detection compiled with js_of_ocaml. Provides a JavaScript API for detecting languages in web applications. Supports both JS and WebAssembly output for optimal performance." 6 + maintainer: ["Anil Madhavapeddy <anil@recoil.org>"] 7 + authors: ["Anil Madhavapeddy"] 8 + license: "MIT" 9 + homepage: "https://tangled.org/@anil.recoil.org/ocaml-langdetect" 10 + bug-reports: "https://tangled.org/@anil.recoil.org/ocaml-langdetect/issues" 11 + depends: [ 12 + "dune" {>= "3.20"} 13 + "ocaml" {>= "5.1.0"} 14 + "langdetect" {= version} 15 + "brr" {>= "0.0.6"} 16 + "js_of_ocaml" {>= "5.0.0"} 17 + "js_of_ocaml-compiler" {>= "5.0.0"} 18 + "odoc" {with-doc} 19 + ] 20 + build: [ 21 + ["dune" "subst"] {dev} 22 + [ 23 + "dune" 24 + "build" 25 + "-p" 26 + name 27 + "-j" 28 + jobs 29 + "@install" 30 + "@runtest" {with-test} 31 + "@doc" {with-doc} 32 + ] 33 + ] 34 + x-maintenance-intent: ["(latest)"]
+1 -1
langdetect.opam
··· 2 2 opam-version: "2.0" 3 3 synopsis: "Language detection library using n-gram frequency analysis" 4 4 description: 5 - "An OCaml port of the Cybozu langdetect algorithm. Detects the natural language of text using n-gram frequency profiles. Supports 49 languages including English, Chinese, Japanese, Arabic, and many European languages." 5 + "An OCaml port of the Cybozu langdetect algorithm. Detects the natural language of text using n-gram frequency profiles. Supports 47 languages including English, Chinese, Japanese, Arabic, and many European languages." 6 6 maintainer: ["Anil Madhavapeddy <anil@recoil.org>"] 7 7 authors: ["Anil Madhavapeddy"] 8 8 license: "MIT"
+38
lib/js/dune
··· 1 + ; Langdetect JavaScript Library 2 + ; Compiled with js_of_ocaml for browser use 3 + 4 + (library 5 + (name langdetect_js) 6 + (public_name langdetect-js) 7 + (libraries langdetect brr) 8 + (modes byte) ; js_of_ocaml requires bytecode 9 + (modules langdetect_js)) 10 + 11 + ; Standalone JavaScript file for direct browser use 12 + (executable 13 + (name langdetect_js_main) 14 + (libraries langdetect_js) 15 + (js_of_ocaml 16 + (javascript_files)) 17 + (modes js) 18 + (modules langdetect_js_main)) 19 + 20 + ; Browser-based test runner 21 + (executable 22 + (name langdetect_js_tests) 23 + (libraries langdetect_js) 24 + (js_of_ocaml 25 + (javascript_files)) 26 + (modes js) 27 + (modules langdetect_js_tests)) 28 + 29 + ; Copy to nice filenames (JS) 30 + (rule 31 + (targets langdetect.js) 32 + (deps langdetect_js_main.bc.js) 33 + (action (copy %{deps} %{targets}))) 34 + 35 + (rule 36 + (targets langdetect-tests.js) 37 + (deps langdetect_js_tests.bc.js) 38 + (action (copy %{deps} %{targets})))
+95
lib/js/langdetect_js.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 3 + SPDX-License-Identifier: MIT 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** JavaScript bindings for langdetect. 7 + 8 + This module provides browser-compatible language detection via js_of_ocaml. 9 + It exposes a simple API on window.langdetect for detecting languages in text. *) 10 + 11 + 12 + (** The detector instance, lazily initialized *) 13 + let detector = lazy (Langdetect.create_default ()) 14 + 15 + (** Detect the language of text, returning the best match or null *) 16 + let detect_best text = 17 + let t = Lazy.force detector in 18 + Langdetect.detect_best t text 19 + 20 + (** Detect language with probability score *) 21 + let detect_with_prob text = 22 + let t = Lazy.force detector in 23 + Langdetect.detect_with_prob t text 24 + 25 + (** Detect all matching languages above threshold *) 26 + let detect_all text = 27 + let t = Lazy.force detector in 28 + Langdetect.detect t text 29 + 30 + (** Get list of supported languages *) 31 + let supported_languages () = 32 + let t = Lazy.force detector in 33 + Langdetect.supported_languages t 34 + 35 + (** Console logging helper *) 36 + let console_log msg = 37 + ignore (Jv.call (Jv.get Jv.global "console") "log" [| Jv.of_string msg |]) 38 + 39 + (** Convert a detection result to a JavaScript object *) 40 + let result_to_jv (r : Langdetect.result) = 41 + Jv.obj [| 42 + "lang", Jv.of_string r.lang; 43 + "prob", Jv.of_float r.prob; 44 + |] 45 + 46 + (** Register the API on a JavaScript object *) 47 + let register_api_on obj = 48 + (* detect(text) -> string | null *) 49 + Jv.set obj "detect" (Jv.callback ~arity:1 (fun text_jv -> 50 + let text = Jv.to_string text_jv in 51 + match detect_best text with 52 + | Some lang -> Jv.of_string lang 53 + | None -> Jv.null 54 + )); 55 + 56 + (* detectWithProb(text) -> {lang, prob} | null *) 57 + Jv.set obj "detectWithProb" (Jv.callback ~arity:1 (fun text_jv -> 58 + let text = Jv.to_string text_jv in 59 + match detect_with_prob text with 60 + | Some (lang, prob) -> 61 + Jv.obj [| 62 + "lang", Jv.of_string lang; 63 + "prob", Jv.of_float prob; 64 + |] 65 + | None -> Jv.null 66 + )); 67 + 68 + (* detectAll(text) -> [{lang, prob}, ...] *) 69 + Jv.set obj "detectAll" (Jv.callback ~arity:1 (fun text_jv -> 70 + let text = Jv.to_string text_jv in 71 + let results = detect_all text in 72 + Jv.of_list result_to_jv results 73 + )); 74 + 75 + (* languages() -> string[] *) 76 + Jv.set obj "languages" (Jv.callback ~arity:0 (fun () -> 77 + let langs = supported_languages () in 78 + Jv.of_array Jv.of_string langs 79 + )); 80 + 81 + (* version *) 82 + Jv.set obj "version" (Jv.of_string "1.0.0") 83 + 84 + (** Register the global API on window.langdetect *) 85 + let register_global_api () = 86 + let api = Jv.obj [||] in 87 + register_api_on api; 88 + Jv.set Jv.global "langdetect" api; 89 + 90 + (* Dispatch 'langdetectReady' event for async loaders *) 91 + let document = Jv.get Jv.global "document" in 92 + let event_class = Jv.get Jv.global "CustomEvent" in 93 + let event = Jv.new' event_class [| Jv.of_string "langdetectReady" |] in 94 + ignore (Jv.call document "dispatchEvent" [| event |]); 95 + console_log "[langdetect] API ready - 47 languages loaded"
+9
lib/js/langdetect_js_main.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 3 + SPDX-License-Identifier: MIT 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Entry point for the standalone JavaScript build. 7 + Registers the API on window.langdetect when the script loads. *) 8 + 9 + let () = Langdetect_js.register_global_api ()
+267
lib/js/langdetect_js_tests.ml
··· 1 + (*--------------------------------------------------------------------------- 2 + Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 3 + SPDX-License-Identifier: MIT 4 + ---------------------------------------------------------------------------*) 5 + 6 + (** Browser-based test runner for langdetect. 7 + 8 + This module runs regression tests in the browser and displays results 9 + in the DOM. It demonstrates language detection across multiple languages. *) 10 + 11 + open Brr 12 + 13 + (** Test case definition *) 14 + type test_case = { 15 + name : string; 16 + text : string; 17 + expected : string; 18 + } 19 + 20 + (** Test results *) 21 + type test_result = { 22 + test : test_case; 23 + detected : string option; 24 + prob : float option; 25 + passed : bool; 26 + time_ms : float; 27 + } 28 + 29 + (** Sample texts for testing various languages *) 30 + let test_cases = [| 31 + (* European languages *) 32 + { name = "English"; text = "The quick brown fox jumps over the lazy dog. This is a sample of English text for language detection testing."; expected = "en" }; 33 + { name = "French"; text = "Bonjour le monde. La langue française est belle et mélodieuse. J'aime beaucoup la culture française."; expected = "fr" }; 34 + { name = "German"; text = "Guten Tag! Die deutsche Sprache hat viele interessante Eigenschaften. Ich lerne gerne Deutsch."; expected = "de" }; 35 + { name = "Spanish"; text = "Hola mundo. El español es un idioma muy hablado en todo el mundo. Me gusta mucho la cultura española."; expected = "es" }; 36 + { name = "Italian"; text = "Ciao mondo! L'italiano è una lingua bellissima. Mi piace molto la cultura italiana e il cibo."; expected = "it" }; 37 + { name = "Portuguese"; text = "Olá mundo! O português é uma língua muito bonita. Eu gosto muito da cultura portuguesa."; expected = "pt" }; 38 + { name = "Dutch"; text = "Hallo wereld! De Nederlandse taal is interessant. Ik hou van de Nederlandse cultuur."; expected = "nl" }; 39 + { name = "Swedish"; text = "Hej världen! Svenska är ett vackert språk. Jag tycker om svensk kultur och mat."; expected = "sv" }; 40 + { name = "Norwegian"; text = "Hei verden! Norsk er et fint språk. Jeg liker norsk kultur og natur veldig godt."; expected = "no" }; 41 + { name = "Danish"; text = "Hej verden! Dansk er et interessant sprog. Jeg kan godt lide dansk kultur."; expected = "da" }; 42 + { name = "Finnish"; text = "Hei maailma! Suomen kieli on erittäin mielenkiintoinen. Pidän suomalaisesta kulttuurista."; expected = "fi" }; 43 + { name = "Polish"; text = "Witaj świecie! Język polski jest bardzo piękny. Lubię polską kulturę i historię."; expected = "pl" }; 44 + { name = "Russian"; text = "Привет мир! Русский язык очень красивый и богатый. Я люблю русскую литературу."; expected = "ru" }; 45 + { name = "Ukrainian"; text = "Привіт світ! Українська мова дуже гарна. Я люблю українську культуру."; expected = "uk" }; 46 + { name = "Czech"; text = "Ahoj světe! Čeština je zajímavý jazyk. Mám rád českou kulturu a historii."; expected = "cs" }; 47 + { name = "Romanian"; text = "Salut lume! Limba română este foarte frumoasă. Îmi place cultura românească."; expected = "ro" }; 48 + { name = "Hungarian"; text = "Szia világ! A magyar nyelv nagyon érdekes. Szeretem a magyar kultúrát."; expected = "hu" }; 49 + { name = "Greek"; text = "Γειά σου κόσμε! Η ελληνική γλώσσα είναι πολύ όμορφη. Μου αρέσει ο ελληνικός πολιτισμός."; expected = "el" }; 50 + { name = "Bulgarian"; text = "Здравей свят! Българският език е много красив. Обичам българската култура."; expected = "bg" }; 51 + 52 + (* Asian languages *) 53 + { name = "Chinese (Simplified)"; text = "你好世界!中文是一种非常古老而美丽的语言。我喜欢学习中国文化和历史。"; expected = "zh-cn" }; 54 + { name = "Chinese (Traditional)"; text = "你好世界!中文是一種非常古老而美麗的語言。我喜歡學習中國文化和歷史。"; expected = "zh-tw" }; 55 + { name = "Japanese"; text = "こんにちは世界!日本語はとても美しい言語です。日本の文化が大好きです。"; expected = "ja" }; 56 + { name = "Korean"; text = "안녕하세요 세계! 한국어는 매우 아름다운 언어입니다. 저는 한국 문화를 좋아합니다."; expected = "ko" }; 57 + { name = "Vietnamese"; text = "Xin chào thế giới! Tiếng Việt là một ngôn ngữ rất đẹp. Tôi yêu văn hóa Việt Nam."; expected = "vi" }; 58 + { name = "Thai"; text = "สวัสดีโลก! ภาษาไทยเป็นภาษาที่สวยงาม ฉันชอบวัฒนธรรมไทยมาก"; expected = "th" }; 59 + { name = "Indonesian"; text = "Halo dunia! Bahasa Indonesia adalah bahasa yang indah. Saya suka budaya Indonesia."; expected = "id" }; 60 + 61 + (* Middle Eastern languages *) 62 + { name = "Arabic"; text = "مرحبا بالعالم! اللغة العربية جميلة جدا. أنا أحب الثقافة العربية والتاريخ."; expected = "ar" }; 63 + { name = "Hebrew"; text = "שלום עולם! השפה העברית היא שפה יפה מאוד. אני אוהב את התרבות העברית."; expected = "he" }; 64 + { name = "Persian"; text = "سلام دنیا! زبان فارسی بسیار زیباست. من فرهنگ ایرانی را دوست دارم."; expected = "fa" }; 65 + { name = "Turkish"; text = "Merhaba dünya! Türkçe çok güzel bir dil. Türk kültürünü ve tarihini seviyorum."; expected = "tr" }; 66 + 67 + (* South Asian languages *) 68 + { name = "Hindi"; text = "नमस्ते दुनिया! हिंदी एक बहुत सुंदर भाषा है। मुझे भारतीय संस्कृति बहुत पसंद है।"; expected = "hi" }; 69 + { name = "Bengali"; text = "হ্যালো বিশ্ব! বাংলা একটি অত্যন্ত সুন্দর ভাষা। আমি বাঙালি সংস্কৃতি পছন্দ করি।"; expected = "bn" }; 70 + { name = "Tamil"; text = "வணக்கம் உலகம்! தமிழ் ஒரு மிக அழகான மொழி. நான் தமிழ் கலாச்சாரத்தை விரும்புகிறேன்."; expected = "ta" }; 71 + { name = "Telugu"; text = "హలో ప్రపంచం! తెలుగు చాలా అందమైన భాష. నాకు తెలుగు సంస్కృతి చాలా ఇష్టం."; expected = "te" }; 72 + { name = "Gujarati"; text = "હેલો વિશ્વ! ગુજરાતી એક સુંદર ભાષા છે. મને ગુજરાતી સંસ્કૃતિ ગમે છે."; expected = "gu" }; 73 + { name = "Urdu"; text = "ہیلو دنیا! اردو ایک بہت خوبصورت زبان ہے۔ مجھے اردو ادب پسند ہے۔"; expected = "ur" }; 74 + |] 75 + 76 + (** Get current time in milliseconds *) 77 + let now_ms () = 78 + Jv.to_float (Jv.call (Jv.get Jv.global "performance") "now" [||]) 79 + 80 + (** Run a single test *) 81 + let run_test detector test = 82 + let start = now_ms () in 83 + let result = Langdetect.detect_with_prob detector test.text in 84 + let time_ms = now_ms () -. start in 85 + let detected, prob = match result with 86 + | Some (lang, p) -> Some lang, Some p 87 + | None -> None, None 88 + in 89 + let passed = match detected with 90 + | Some lang -> lang = test.expected 91 + | None -> false 92 + in 93 + { test; detected; prob; passed; time_ms } 94 + 95 + (** Run all tests and return results *) 96 + let run_all_tests () = 97 + let detector = Langdetect.create_default () in 98 + Array.map (run_test detector) test_cases 99 + 100 + (** Create a result row element *) 101 + let create_result_row result = 102 + let status_class = if result.passed then "pass" else "fail" in 103 + let status_text = if result.passed then "✓" else "✗" in 104 + let detected_text = match result.detected with 105 + | Some lang -> lang 106 + | None -> "(none)" 107 + in 108 + let prob_text = match result.prob with 109 + | Some p -> Printf.sprintf "%.1f%%" (p *. 100.0) 110 + | None -> "-" 111 + in 112 + let time_text = Printf.sprintf "%.1fms" result.time_ms in 113 + 114 + let tr = El.tr [] in 115 + El.set_children tr [ 116 + El.td [El.txt' result.test.name]; 117 + El.td ~at:[At.class' (Jstr.v "code")] [El.txt' result.test.expected]; 118 + El.td ~at:[At.class' (Jstr.v "code")] [El.txt' detected_text]; 119 + El.td [El.txt' prob_text]; 120 + El.td [El.txt' time_text]; 121 + El.td ~at:[At.class' (Jstr.v status_class)] [El.txt' status_text]; 122 + ]; 123 + tr 124 + 125 + (** Create summary stats *) 126 + let create_summary results = 127 + let total = Array.length results in 128 + let passed = Array.fold_left (fun acc r -> if r.passed then acc + 1 else acc) 0 results in 129 + let failed = total - passed in 130 + let total_time = Array.fold_left (fun acc r -> acc +. r.time_ms) 0.0 results in 131 + let avg_time = total_time /. float_of_int total in 132 + 133 + El.div ~at:[At.class' (Jstr.v "summary")] [ 134 + El.h2 [El.txt' "Test Results"]; 135 + El.p [ 136 + El.strong [El.txt' (Printf.sprintf "%d/%d tests passed" passed total)]; 137 + El.txt' (Printf.sprintf " (%d failed)" failed); 138 + ]; 139 + El.p [ 140 + El.txt' (Printf.sprintf "Total time: %.1fms (avg %.2fms per test)" total_time avg_time); 141 + ]; 142 + ] 143 + 144 + (** Main test runner *) 145 + let run_tests_ui () = 146 + (* Find or create output container *) 147 + let container = match El.find_first_by_selector (Jstr.v "#test-results") ~root:(Document.body G.document) with 148 + | Some el -> el 149 + | None -> 150 + let el = El.div ~at:[At.id (Jstr.v "test-results")] [] in 151 + El.append_children (Document.body G.document) [el]; 152 + el 153 + in 154 + 155 + (* Show loading message *) 156 + El.set_children container [ 157 + El.p [El.txt' "Running tests..."] 158 + ]; 159 + 160 + (* Run tests (async to allow UI update) *) 161 + let _ = Jv.call Jv.global "setTimeout" [| 162 + Jv.callback ~arity:0 (fun () -> 163 + let results = run_all_tests () in 164 + 165 + (* Build results table *) 166 + let thead = El.thead [ 167 + El.tr [ 168 + El.th [El.txt' "Language"]; 169 + El.th [El.txt' "Expected"]; 170 + El.th [El.txt' "Detected"]; 171 + El.th [El.txt' "Confidence"]; 172 + El.th [El.txt' "Time"]; 173 + El.th [El.txt' "Status"]; 174 + ] 175 + ] in 176 + 177 + let tbody = El.tbody [] in 178 + Array.iter (fun result -> 179 + El.append_children tbody [create_result_row result] 180 + ) results; 181 + 182 + let table = El.table ~at:[At.class' (Jstr.v "results-table")] [thead; tbody] in 183 + 184 + (* Update container *) 185 + El.set_children container [ 186 + create_summary results; 187 + table; 188 + ] 189 + ); 190 + Jv.of_int 10 191 + |] in 192 + () 193 + 194 + (** Interactive demo section *) 195 + let setup_demo () = 196 + let demo_container = match El.find_first_by_selector (Jstr.v "#demo") ~root:(Document.body G.document) with 197 + | Some el -> el 198 + | None -> Document.body G.document 199 + in 200 + 201 + let textarea = El.textarea ~at:[ 202 + At.id (Jstr.v "demo-input"); 203 + At.v (Jstr.v "rows") (Jstr.v "4"); 204 + At.v (Jstr.v "placeholder") (Jstr.v "Enter text to detect language..."); 205 + ] [] in 206 + 207 + let result_div = El.div ~at:[At.id (Jstr.v "demo-result")] [ 208 + El.txt' "Enter text above and click Detect" 209 + ] in 210 + 211 + let detect_button = El.button ~at:[At.id (Jstr.v "demo-button")] [El.txt' "Detect Language"] in 212 + 213 + (* Set up click handler *) 214 + let detector = Langdetect.create_default () in 215 + ignore (Ev.listen Ev.click (fun _ -> 216 + let text = Jstr.to_string (El.prop El.Prop.value textarea) in 217 + if String.length text > 0 then begin 218 + let start = now_ms () in 219 + let results = Langdetect.detect detector text in 220 + let time_ms = now_ms () -. start in 221 + 222 + let result_html = match results with 223 + | [] -> 224 + [El.txt' "No language detected (text may be too short)"] 225 + | _ -> 226 + let items = List.map (fun (r : Langdetect.result) -> 227 + El.li [ 228 + El.strong [El.txt' r.lang]; 229 + El.txt' (Printf.sprintf " — %.1f%% confidence" (r.prob *. 100.0)) 230 + ] 231 + ) results in 232 + [ 233 + El.p [El.txt' (Printf.sprintf "Detected in %.1fms:" time_ms)]; 234 + El.ul items 235 + ] 236 + in 237 + El.set_children result_div result_html 238 + end 239 + ) (El.as_target detect_button)); 240 + 241 + (* Only add demo section if container exists *) 242 + if El.has_tag_name (Jstr.v "DIV") demo_container then 243 + El.set_children demo_container [ 244 + El.h2 [El.txt' "Try It"]; 245 + El.div ~at:[At.class' (Jstr.v "demo-area")] [ 246 + textarea; 247 + detect_button; 248 + result_div; 249 + ] 250 + ] 251 + 252 + (** Entry point *) 253 + let () = 254 + (* Wait for DOM to be ready *) 255 + let ready_state = Jv.get (Jv.get Jv.global "document") "readyState" |> Jv.to_string in 256 + if ready_state = "loading" then 257 + ignore (Jv.call Jv.global "addEventListener" [| 258 + Jv.of_string "DOMContentLoaded"; 259 + Jv.callback ~arity:1 (fun _ -> 260 + run_tests_ui (); 261 + setup_demo () 262 + ) 263 + |]) 264 + else begin 265 + run_tests_ui (); 266 + setup_demo () 267 + end
+2
lib/langdetect.ml
··· 277 277 | [] -> None 278 278 | best :: _ -> Some (best.lang, best.prob) 279 279 280 + let supported_languages t = t.lang_list 281 + 280 282 let create_default ?config () = 281 283 create_packed ?config 282 284 ~ngram_table:Profiles_packed.ngram_table
+4
lib/langdetect.mli
··· 63 63 val detect_with_prob : t -> string -> (string * float) option 64 64 (** [detect_with_prob t text] returns the best matching language code with its 65 65 probability, or [None] if no language could be detected. *) 66 + 67 + val supported_languages : t -> string array 68 + (** [supported_languages t] returns an array of language codes that this 69 + detector supports. *)
+299
test.html
··· 1 + <!DOCTYPE html> 2 + <html lang="en"> 3 + <head> 4 + <meta charset="UTF-8"> 5 + <meta name="viewport" content="width=device-width, initial-scale=1.0"> 6 + <title>Langdetect - Language Detection Demo</title> 7 + <style> 8 + * { 9 + box-sizing: border-box; 10 + } 11 + body { 12 + font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; 13 + line-height: 1.6; 14 + max-width: 1000px; 15 + margin: 0 auto; 16 + padding: 2rem; 17 + background: #f5f5f5; 18 + color: #333; 19 + } 20 + h1 { 21 + color: #2563eb; 22 + border-bottom: 3px solid #3b82f6; 23 + padding-bottom: 0.5rem; 24 + margin-bottom: 0.5rem; 25 + } 26 + .subtitle { 27 + color: #666; 28 + margin-top: 0; 29 + margin-bottom: 2rem; 30 + } 31 + .section { 32 + background: white; 33 + border: 1px solid #e0e0e0; 34 + border-radius: 12px; 35 + padding: 1.5rem; 36 + margin: 1.5rem 0; 37 + box-shadow: 0 2px 4px rgba(0,0,0,0.05); 38 + } 39 + .section h2 { 40 + margin-top: 0; 41 + color: #1e40af; 42 + } 43 + .demo-area { 44 + display: flex; 45 + flex-direction: column; 46 + gap: 1rem; 47 + } 48 + textarea { 49 + width: 100%; 50 + padding: 1rem; 51 + font-size: 1rem; 52 + font-family: inherit; 53 + border: 2px solid #e0e0e0; 54 + border-radius: 8px; 55 + resize: vertical; 56 + transition: border-color 0.2s; 57 + } 58 + textarea:focus { 59 + outline: none; 60 + border-color: #3b82f6; 61 + } 62 + button { 63 + padding: 0.75rem 2rem; 64 + font-size: 1rem; 65 + font-weight: 600; 66 + cursor: pointer; 67 + border: none; 68 + border-radius: 8px; 69 + background: #2563eb; 70 + color: white; 71 + transition: all 0.2s; 72 + align-self: flex-start; 73 + } 74 + button:hover { 75 + background: #1d4ed8; 76 + transform: translateY(-1px); 77 + } 78 + button:active { 79 + transform: translateY(0); 80 + } 81 + #demo-result { 82 + padding: 1rem; 83 + background: #f8fafc; 84 + border-radius: 8px; 85 + border: 1px solid #e2e8f0; 86 + } 87 + #demo-result ul { 88 + margin: 0.5rem 0; 89 + padding-left: 1.5rem; 90 + } 91 + #demo-result li { 92 + margin: 0.25rem 0; 93 + } 94 + .summary { 95 + background: #eff6ff; 96 + padding: 1rem 1.5rem; 97 + border-radius: 8px; 98 + margin-bottom: 1rem; 99 + } 100 + .summary h2 { 101 + margin: 0 0 0.5rem 0; 102 + } 103 + .summary p { 104 + margin: 0.25rem 0; 105 + } 106 + .results-table { 107 + width: 100%; 108 + border-collapse: collapse; 109 + font-size: 0.9rem; 110 + } 111 + .results-table th, 112 + .results-table td { 113 + padding: 0.75rem 1rem; 114 + text-align: left; 115 + border-bottom: 1px solid #e0e0e0; 116 + } 117 + .results-table th { 118 + background: #f8fafc; 119 + font-weight: 600; 120 + color: #475569; 121 + } 122 + .results-table tr:hover { 123 + background: #f8fafc; 124 + } 125 + .results-table .code { 126 + font-family: 'SF Mono', Monaco, 'Cascadia Code', monospace; 127 + font-size: 0.85rem; 128 + background: #f1f5f9; 129 + padding: 0.2rem 0.4rem; 130 + border-radius: 4px; 131 + } 132 + .results-table .pass { 133 + color: #16a34a; 134 + font-weight: bold; 135 + font-size: 1.1rem; 136 + } 137 + .results-table .fail { 138 + color: #dc2626; 139 + font-weight: bold; 140 + font-size: 1.1rem; 141 + } 142 + .loading { 143 + text-align: center; 144 + padding: 2rem; 145 + color: #666; 146 + } 147 + .api-docs { 148 + background: #1e293b; 149 + color: #e2e8f0; 150 + padding: 1rem; 151 + border-radius: 8px; 152 + overflow-x: auto; 153 + } 154 + .api-docs code { 155 + color: #7dd3fc; 156 + } 157 + .api-docs .comment { 158 + color: #94a3b8; 159 + } 160 + .sample-texts { 161 + display: grid; 162 + grid-template-columns: repeat(auto-fill, minmax(200px, 1fr)); 163 + gap: 0.5rem; 164 + margin-top: 1rem; 165 + } 166 + .sample-text { 167 + padding: 0.5rem 1rem; 168 + font-size: 0.85rem; 169 + background: #f1f5f9; 170 + border: 1px solid #e2e8f0; 171 + border-radius: 6px; 172 + cursor: pointer; 173 + transition: all 0.2s; 174 + } 175 + .sample-text:hover { 176 + background: #e2e8f0; 177 + border-color: #cbd5e1; 178 + } 179 + .sample-text .lang { 180 + font-weight: 600; 181 + color: #1e40af; 182 + } 183 + </style> 184 + </head> 185 + <body> 186 + 187 + <h1>🌍 Langdetect</h1> 188 + <p class="subtitle">Language detection for the browser using n-gram frequency analysis</p> 189 + 190 + <!-- Interactive Demo --> 191 + <div class="section" id="demo"> 192 + <h2>Try It</h2> 193 + <div class="demo-area"> 194 + <textarea id="demo-input" rows="4" placeholder="Enter or paste text to detect its language..."></textarea> 195 + <button id="demo-button">Detect Language</button> 196 + <div id="demo-result">Enter text above and click Detect</div> 197 + </div> 198 + 199 + <p style="margin-top: 1rem; margin-bottom: 0.5rem; color: #666; font-size: 0.9rem;"> 200 + Click a sample to try: 201 + </p> 202 + <div class="sample-texts"> 203 + <div class="sample-text" data-text="The quick brown fox jumps over the lazy dog."> 204 + <span class="lang">🇬🇧 English</span> 205 + </div> 206 + <div class="sample-text" data-text="Bonjour le monde! Comment allez-vous aujourd'hui?"> 207 + <span class="lang">🇫🇷 French</span> 208 + </div> 209 + <div class="sample-text" data-text="Guten Tag! Wie geht es Ihnen heute?"> 210 + <span class="lang">🇩🇪 German</span> 211 + </div> 212 + <div class="sample-text" data-text="¡Hola mundo! ¿Cómo estás hoy?"> 213 + <span class="lang">🇪🇸 Spanish</span> 214 + </div> 215 + <div class="sample-text" data-text="你好世界!今天你好吗?"> 216 + <span class="lang">🇨🇳 Chinese</span> 217 + </div> 218 + <div class="sample-text" data-text="こんにちは世界!今日はお元気ですか?"> 219 + <span class="lang">🇯🇵 Japanese</span> 220 + </div> 221 + <div class="sample-text" data-text="مرحبا بالعالم! كيف حالك اليوم؟"> 222 + <span class="lang">🇸🇦 Arabic</span> 223 + </div> 224 + <div class="sample-text" data-text="Привет мир! Как дела сегодня?"> 225 + <span class="lang">🇷🇺 Russian</span> 226 + </div> 227 + </div> 228 + </div> 229 + 230 + <!-- Test Results --> 231 + <div class="section"> 232 + <div id="test-results"> 233 + <p class="loading">Loading tests...</p> 234 + </div> 235 + </div> 236 + 237 + <!-- API Documentation --> 238 + <div class="section"> 239 + <h2>JavaScript API</h2> 240 + <div class="api-docs"> 241 + <pre><span class="comment">// Detect the most likely language</span> 242 + <code>langdetect.detect</code>("Hello, world!") 243 + <span class="comment">// → "en"</span> 244 + 245 + <span class="comment">// Get detection with confidence score</span> 246 + <code>langdetect.detectWithProb</code>("Bonjour le monde!") 247 + <span class="comment">// → { lang: "fr", prob: 0.9999 }</span> 248 + 249 + <span class="comment">// Get all matching languages</span> 250 + <code>langdetect.detectAll</code>("Hello world") 251 + <span class="comment">// → [{ lang: "en", prob: 0.85 }, { lang: "de", prob: 0.10 }, ...]</span> 252 + 253 + <span class="comment">// List supported languages</span> 254 + <code>langdetect.languages</code>() 255 + <span class="comment">// → ["ar", "bg", "bn", "ca", "cs", "da", "de", "el", ...]</span></pre> 256 + </div> 257 + </div> 258 + 259 + <!-- Supported Languages --> 260 + <div class="section"> 261 + <h2>Supported Languages (47)</h2> 262 + <p> 263 + Arabic, Bengali, Bulgarian, Catalan, Chinese (Simplified), Chinese (Traditional), 264 + Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, 265 + Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, 266 + Lithuanian, Macedonian, Malayalam, Norwegian, Panjabi, Persian, Polish, Portuguese, 267 + Romanian, Russian, Sinhala, Slovak, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, 268 + Turkish, Ukrainian, Urdu, Vietnamese 269 + </p> 270 + </div> 271 + 272 + <!-- Load the compiled library --> 273 + <script src="_build/default/lib/js/langdetect-tests.js"></script> 274 + 275 + <script> 276 + // Handle sample text clicks 277 + document.querySelectorAll('.sample-text').forEach(el => { 278 + el.addEventListener('click', () => { 279 + const text = el.getAttribute('data-text'); 280 + document.getElementById('demo-input').value = text; 281 + document.getElementById('demo-button').click(); 282 + }); 283 + }); 284 + 285 + // Check if library loaded 286 + window.addEventListener('load', () => { 287 + if (typeof langdetect === 'undefined') { 288 + document.getElementById('demo-result').innerHTML = 289 + '<strong style="color: #dc2626;">Library not loaded</strong><br>' + 290 + 'Run <code>opam exec -- dune build lib/js/langdetect-tests.js</code> first.'; 291 + document.getElementById('test-results').innerHTML = 292 + '<p style="color: #dc2626;"><strong>Tests cannot run:</strong> langdetect.js not found.<br>' + 293 + 'Build with: <code>opam exec -- dune build lib/js/</code></p>'; 294 + } 295 + }); 296 + </script> 297 + 298 + </body> 299 + </html>