Git fork

update_unicode.sh: move it into contrib/update-unicode

As it's used only by a tiny minority of the Git developer population,
this script does not belong into the main Git source directory.

Move it into contrib/ and adjust the paths to account for the new
location.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Beat Bolli and committed by
Junio C Hamano
f3eb5492 32c239d1

+61 -41
-1
.gitignore
··· 231 231 /config.mak.autogen 232 232 /config.mak.append 233 233 /configure 234 - /unicode 235 234 /tags 236 235 /TAGS 237 236 /cscope*
+3
contrib/update-unicode/.gitignore
··· 1 + uniset/ 2 + UnicodeData.txt 3 + EastAsianWidth.txt
+20
contrib/update-unicode/README
··· 1 + TL;DR: Run update_unicode.sh after the publication of a new Unicode 2 + standard and commit the resulting unicode_widths.h file. 3 + 4 + The long version 5 + ================ 6 + 7 + The Git source code ships the file unicode_widths.h which contains 8 + tables of zero and double width Unicode code points, respectively. 9 + These tables are generated using update_unicode.sh in this directory. 10 + update_unicode.sh itself uses a third-party tool, uniset, to query two 11 + Unicode data files for the interesting code points. 12 + 13 + On first run, update_unicode.sh clones uniset from Github and builds it. 14 + This requires a current-ish version of autoconf (2.69 works per December 15 + 2016). 16 + 17 + On each run, update_unicode.sh checks whether more recent Unicode data 18 + files are available from the Unicode consortium, and rebuilds the header 19 + unicode_widths.h with the new data. The new header can then be 20 + committed.
+38
contrib/update-unicode/update_unicode.sh
··· 1 + #!/bin/sh 2 + #See http://www.unicode.org/reports/tr44/ 3 + # 4 + #Me Enclosing_Mark an enclosing combining mark 5 + #Mn Nonspacing_Mark a nonspacing combining mark (zero advance width) 6 + #Cf Format a format control character 7 + # 8 + cd "$(dirname "$0")" 9 + UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode_width.h 10 + ( 11 + if ! test -f UnicodeData.txt; then 12 + wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt 13 + fi && 14 + if ! test -f EastAsianWidth.txt; then 15 + wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt 16 + fi && 17 + if ! test -d uniset; then 18 + git clone https://github.com/depp/uniset.git 19 + fi && 20 + ( 21 + cd uniset && 22 + if ! test -x uniset; then 23 + autoreconf -i && 24 + ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb' 25 + fi && 26 + make 27 + ) && 28 + UNICODE_DIR=. && export UNICODE_DIR && 29 + cat >$UNICODEWIDTH_H <<-EOF 30 + static const struct interval zero_width[] = { 31 + $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD | 32 + grep -v plane) 33 + }; 34 + static const struct interval double_width[] = { 35 + $(uniset/uniset --32 eaw:F,W) 36 + }; 37 + EOF 38 + )
-40
update_unicode.sh
··· 1 - #!/bin/sh 2 - #See http://www.unicode.org/reports/tr44/ 3 - # 4 - #Me Enclosing_Mark an enclosing combining mark 5 - #Mn Nonspacing_Mark a nonspacing combining mark (zero advance width) 6 - #Cf Format a format control character 7 - # 8 - UNICODEWIDTH_H=../unicode_width.h 9 - if ! test -d unicode; then 10 - mkdir unicode 11 - fi && 12 - ( cd unicode && 13 - if ! test -f UnicodeData.txt; then 14 - wget http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt 15 - fi && 16 - if ! test -f EastAsianWidth.txt; then 17 - wget http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt 18 - fi && 19 - if ! test -d uniset; then 20 - git clone https://github.com/depp/uniset.git 21 - fi && 22 - ( 23 - cd uniset && 24 - if ! test -x uniset; then 25 - autoreconf -i && 26 - ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb' 27 - fi && 28 - make 29 - ) && 30 - UNICODE_DIR=. && export UNICODE_DIR && 31 - cat >$UNICODEWIDTH_H <<-EOF 32 - static const struct interval zero_width[] = { 33 - $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD | 34 - grep -v plane) 35 - }; 36 - static const struct interval double_width[] = { 37 - $(uniset/uniset --32 eaw:F,W) 38 - }; 39 - EOF 40 - )