3.4. Locale Encoding

  • utf-8 - a.k.a. Unicode - international standard (should be always used!)

  • iso-8859-1 - ISO standard for Western Europe and USA

  • iso-8859-2 - ISO standard for Central Europe (including Poland)

  • cp1250 or windows-1250 - Central European encoding on Windows

  • cp1251 or windows-1251 - Eastern European encoding on Windows

  • cp1252 or windows-1252 - Western European encoding on Windows

  • ASCII - ASCII characters only

  • Since Windows 10 version 1903, UTF-8 is default encoding for Notepad!

Encodings:

3.4.1. SetUp

>>> from pathlib import Path
>>> Path('/tmp/myfile.txt').unlink(missing_ok=True)

3.4.2. ASCII Table

  • Standard (0–127)

  • Extended (128–255)

  • Standard ASCII is the same everywhere

  • Extended ASCII is Operating System dependent

../../_images/locale-encoding-ascii-std.png

Figure 3.26. ASCII table 4

../../_images/locale-encoding-ascii-ext.png

Figure 3.27. ASCII table extended 4

../../_images/locale-encoding-ascii-all.jpg

Figure 3.28. Source: 9

3.4.3. Unicode

../../_images/locale-encoding-unicode2.png

Figure 3.29. Unicode 5

../../_images/locale-encoding-unicode3.png

Figure 3.30. Unicode 6

../../_images/locale-encoding-utf-polish.png

Figure 3.31. Unicode Polish Chars 10

3.4.4. Windows Encoding

../../_images/locale-encoding-windows2000-notepad-saveas.png

Figure 3.32. Windows 2000 Notepad "Save As" window with possibility to select encoding. UTF-8 is not selected by default... 1

../../_images/locale-encoding-windows10-notepad-saveas.png

Figure 3.33. Windows 10 Notepad "Save As" window with possibility to select encoding.

Since Windows 10.1903 (May 2019) notepad writes files in UTF-8 by default! 2 3

../../_images/locale-encoding-win10-21H1-notepad-1.png

Figure 3.34. Windows 10 Notepad "Save As" window with possibility to select encoding. Since Windows 10.1903 (May 2019) notepad writes files in UTF-8 by default!

../../_images/locale-encoding-win10-21H1-notepad-2.png

Figure 3.35. Windows 10 Notepad "Save As" window with possibility to select encoding. Since Windows 10.1903 (May 2019) notepad writes files in UTF-8 by default!

3.4.5. Str vs Bytes

  • That was a big change in Python 3

  • In Python 2, str was bytes

  • In Python 3, str is unicode (UTF-8)

>>> text = 'Księżyc'
>>> text
'Księżyc'
>>> text = b'Księżyc'
Traceback (most recent call last):
SyntaxError: bytes can only contain ASCII literal characters

Default encoding is UTF-8. Encoding names are case insensitive. cp1250 and windows-1250 are aliases the same codec:

>>> text = 'Księżyc'
>>>
>>> text.encode()
b'Ksi\xc4\x99\xc5\xbcyc'
>>> text.encode('utf-8')
b'Ksi\xc4\x99\xc5\xbcyc'
>>> text.encode('iso-8859-2')
b'Ksi\xea\xbfyc'
>>> text.encode('cp1250')
b'Ksi\xea\xbfyc'
>>> text.encode('windows-1250')
b'Ksi\xea\xbfyc'

Note the length change while encoding:

>>> text = 'Księżyc'
>>> text
'Księżyc'
>>> len(text)
7
>>> text = 'Księżyc'.encode()
>>> text
b'Ksi\xc4\x99\xc5\xbcyc'
>>> len(text)
9

Note also, that those characters produce longer output:

>>> 'ó'.encode()
b'\xc3\xb3'

But despite being several "characters" long, the length is different:

>>> len(b'\xc3\xb3')
2

Here's the output of all Polish diacritics (accented characters) with their encoding:

>>> 'ą'.encode()
b'\xc4\x85'
>>> 'ć'.encode()
b'\xc4\x87'
>>> 'ę'.encode()
b'\xc4\x99'
>>> 'ł'.encode()
b'\xc5\x82'
>>> 'ń'.encode()
b'\xc5\x84'
>>> 'ó'.encode()
b'\xc3\xb3'
>>> 'ś'.encode()
b'\xc5\x9b'
>>> 'ż'.encode()
b'\xc5\xbc'
>>> 'ź'.encode()
b'\xc5\xba'

Note also a different way of iterating over bytes:

>>> text = 'Księżyc'
>>>
>>> for character in text:
...     print(character)
K
s
i
ę
ż
y
c
>>>
>>> for character in text.encode():
...     print(character)
75
115
105
196
153
197
188
121
99

3.4.6. UTF-8

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE, mode='w', encoding='utf-8') as file:
...     file.write('José Jiménez')
12
>>>
>>> with open(FILE, encoding='utf-8') as file:
...     print(file.read())
José Jiménez
../../_images/locale-encoding-utf.png

Figure 3.36. UTF-8. Source: 7

../../_images/locale-encoding-utf2.jpg

Figure 3.37. UTF-8. Source: 8

3.4.7. Unicode Encode Error

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE, mode='w', encoding='cp1250') as file:
...     file.write('José Jiménez')
12

3.4.8. Unicode Decode Error

>>> FILE = r'/tmp/myfile.txt'
>>>
>>> with open(FILE, mode='w', encoding='utf-8') as file:
...     file.write('José Jiménez')
12
>>>
>>> with open(FILE, encoding='cp1250') as file:
...     print(file.read())
José Jiménez

3.4.9. Escape Characters

../../_images/type-machine2.jpg

Figure 3.38. Why we have '\r\n' on Windows?

../../_images/type-machine1.gif

Figure 3.39. Source: [#typewriter]_

Frequently used escape characters:

  • \n - New line (ENTER)

  • \t - Horizontal Tab (TAB)

  • \' - Single quote ' (escape in single quoted strings)

  • \" - Double quote " (escape in double quoted strings)

  • \\ - Backslash \ (to indicate, that this is not escape char)

Less frequently used escape characters:

  • \a - Bell (BEL)

  • \b - Backspace (BS)

  • \f - New page (FF - Form Feed)

  • \v - Vertical Tab (VT)

  • \uF680 - Character with 16-bit (2 bytes) hex value F680

  • \U0001F680 - Character with 32-bit (4 bytes) hex value 0001F680

  • \o755 - ASCII character with octal value 755

  • \x1F680 - ASCII character with hex value 1F680

Emoticons:

>>> print('\U0001F680')
🚀
>>> a = '\U0001F9D1'  # 🧑
>>> b = '\U0000200D'  # ''
>>> c = '\U0001F680'  # 🚀
>>>
>>> astronaut = a + b + c
>>> print(astronaut)
🧑‍🚀

3.4.10. Encoding

Table 3.25. Encoding

dec

hex

oct

utf-8

iso-8859-2

cp-1250

name

0

0x0

0o0

NULL

NULL

NULL

NULL

1

0x1

0o1







SOH

2

0x2

0o2







STX

3

0x3

0o3







ETX

4

0x4

0o4







EOT

5

0x5

0o5







ENQ

6

0x6

0o6







ACK

7

0x7

0o7







BEL

8

0x8

0o10







BS

9

0x9

0o11

t

t

t

TAB

10

0xa

0o12

n

n

n

LF

11

0xb

0o13

VT

12

0xc

0o14

FF

13

0xd

0o15

r

r

r

CR

14

0xe

0o16







SO

15

0xf

0o17







SI

16

0x10

0o20







DLE

17

0x11

0o21







DC1

18

0x12

0o22







DC2

19

0x13

0o23







DC3

20

0x14

0o24







DC4

21

0x15

0o25







NAK

22

0x16

0o26







SYN

23

0x17

0o27







ETB

24

0x18

0o30







CAN

25

0x19

0o31







EM

26

0x1a

0o32







SUB

27

0x1b

0o33







ESC

28

0x1c

0o34

FS

29

0x1d

0o35

GS

30

0x1e

0o36

RS

31

0x1f

0o37

US

32

0x20

0o40

SPACE

33

0x21

0o41

!

!

!

EXCLAMATION MARK

34

0x22

0o42

"

"

"

QUOTATION MARK

35

0x23

0o43

#

#

#

NUMBER SIGN

36

0x24

0o44

$

$

$

DOLLAR SIGN

37

0x25

0o45

%

%

%

PERCENT SIGN

38

0x26

0o46

&

&

&

AMPERSAND

39

0x27

0o47

'

'

'

APOSTROPHE

40

0x28

0o50

(

(

(

LEFT PARENTHESIS

41

0x29

0o51

)

)

)

RIGHT PARENTHESIS

42

0x2a

0o52

ASTERISK

43

0x2b

0o53

PLUS SIGN

44

0x2c

0o54

,

,

,

COMMA

45

0x2d

0o55

HYPHEN-MINUS

46

0x2e

0o56

.

.

.

FULL STOP

47

0x2f

0o57

/

/

/

SOLIDUS

48

0x30

0o60

0

0

0

DIGIT ZERO

49

0x31

0o61

1

1

1

DIGIT ONE

50

0x32

0o62

2

2

2

DIGIT TWO

51

0x33

0o63

3

3

3

DIGIT THREE

52

0x34

0o64

4

4

4

DIGIT FOUR

53

0x35

0o65

5

5

5

DIGIT FIVE

54

0x36

0o66

6

6

6

DIGIT SIX

55

0x37

0o67

7

7

7

DIGIT SEVEN

56

0x38

0o70

8

8

8

DIGIT EIGHT

57

0x39

0o71

9

9

9

DIGIT NINE

58

0x3a

0o72

:

:

:

COLON

59

0x3b

0o73

;

;

;

SEMICOLON

60

0x3c

0o74

<

<

<

LESS-THAN SIGN

61

0x3d

0o75

=

=

=

EQUALS SIGN

62

0x3e

0o76

>

>

>

GREATER-THAN SIGN

63

0x3f

0o77

?

?

?

QUESTION MARK

64

0x40

0o100

@

@

@

COMMERCIAL AT

65

0x41

0o101

A

A

A

LATIN CAPITAL LETTER A

66

0x42

0o102

B

B

B

LATIN CAPITAL LETTER B

67

0x43

0o103

C

C

C

LATIN CAPITAL LETTER C

68

0x44

0o104

D

D

D

LATIN CAPITAL LETTER D

69

0x45

0o105

E

E

E

LATIN CAPITAL LETTER E

70

0x46

0o106

F

F

F

LATIN CAPITAL LETTER F

71

0x47

0o107

G

G

G

LATIN CAPITAL LETTER G

72

0x48

0o110

H

H

H

LATIN CAPITAL LETTER H

73

0x49

0o111

I

I

I

LATIN CAPITAL LETTER I

74

0x4a

0o112

J

J

J

LATIN CAPITAL LETTER J

75

0x4b

0o113

K

K

K

LATIN CAPITAL LETTER K

76

0x4c

0o114

L

L

L

LATIN CAPITAL LETTER L

77

0x4d

0o115

M

M

M

LATIN CAPITAL LETTER M

78

0x4e

0o116

N

N

N

LATIN CAPITAL LETTER N

79

0x4f

0o117

O

O

O

LATIN CAPITAL LETTER O

80

0x50

0o120

P

P

P

LATIN CAPITAL LETTER P

81

0x51

0o121

Q

Q

Q

LATIN CAPITAL LETTER Q

82

0x52

0o122

R

R

R

LATIN CAPITAL LETTER R

83

0x53

0o123

S

S

S

LATIN CAPITAL LETTER S

84

0x54

0o124

T

T

T

LATIN CAPITAL LETTER T

85

0x55

0o125

U

U

U

LATIN CAPITAL LETTER U

86

0x56

0o126

V

V

V

LATIN CAPITAL LETTER V

87

0x57

0o127

W

W

W

LATIN CAPITAL LETTER W

88

0x58

0o130

X

X

X

LATIN CAPITAL LETTER X

89

0x59

0o131

Y

Y

Y

LATIN CAPITAL LETTER Y

90

0x5a

0o132

Z

Z

Z

LATIN CAPITAL LETTER Z

91

0x5b

0o133

[

[

[

LEFT SQUARE BRACKET

92

0x5c

0o134

REVERSE SOLIDUS

93

0x5d

0o135

]

]

]

RIGHT SQUARE BRACKET

94

0x5e

0o136

^

^

^

CIRCUMFLEX ACCENT

95

0x5f

0o137

_

_

_

LOW LINE

96

0x60

0o140

`

`

`

GRAVE ACCENT

97

0x61

0o141

a

a

a

LATIN SMALL LETTER A

98

0x62

0o142

b

b

b

LATIN SMALL LETTER B

99

0x63

0o143

c

c

c

LATIN SMALL LETTER C

100

0x64

0o144

d

d

d

LATIN SMALL LETTER D

101

0x65

0o145

e

e

e

LATIN SMALL LETTER E

102

0x66

0o146

f

f

f

LATIN SMALL LETTER F

103

0x67

0o147

g

g

g

LATIN SMALL LETTER G

104

0x68

0o150

h

h

h

LATIN SMALL LETTER H

105

0x69

0o151

i

i

i

LATIN SMALL LETTER I

106

0x6a

0o152

j

j

j

LATIN SMALL LETTER J

107

0x6b

0o153

k

k

k

LATIN SMALL LETTER K

108

0x6c

0o154

l

l

l

LATIN SMALL LETTER L

109

0x6d

0o155

m

m

m

LATIN SMALL LETTER M

110

0x6e

0o156

n

n

n

LATIN SMALL LETTER N

111

0x6f

0o157

o

o

o

LATIN SMALL LETTER O

112

0x70

0o160

p

p

p

LATIN SMALL LETTER P

113

0x71

0o161

q

q

q

LATIN SMALL LETTER Q

114

0x72

0o162

r

r

r

LATIN SMALL LETTER R

115

0x73

0o163

s

s

s

LATIN SMALL LETTER S

116

0x74

0o164

t

t

t

LATIN SMALL LETTER T

117

0x75

0o165

u

u

u

LATIN SMALL LETTER U

118

0x76

0o166

v

v

v

LATIN SMALL LETTER V

119

0x77

0o167

w

w

w

LATIN SMALL LETTER W

120

0x78

0o170

x

x

x

LATIN SMALL LETTER X

121

0x79

0o171

y

y

y

LATIN SMALL LETTER Y

122

0x7a

0o172

z

z

z

LATIN SMALL LETTER Z

123

0x7b

0o173

{

{

{

LEFT CURLY BRACKET

124

0x7c

0o174




VERTICAL LINE

125

0x7d

0o175

}

}

}

RIGHT CURLY BRACKET

126

0x7e

0o176

~

~

~

TILDE

127

0x7f

0o177







DEL

128

0x80

0o200

€

€

129

0x81

0o201





130

0x82

0o202

‚

‚

131

0x83

0o203

ƒ

ƒ

132

0x84

0o204

„

„

133

0x85

0o205

134

0x86

0o206

†

†

135

0x87

0o207

‡

‡

136

0x88

0o210

ˆ

ˆ

137

0x89

0o211

‰

‰

138

0x8a

0o212

Š

Š

Š

139

0x8b

0o213

‹

‹

140

0x8c

0o214

Œ

Œ

Ś

141

0x8d

0o215





Ť

142

0x8e

0o216

Ž

Ž

Ž

143

0x8f

0o217





Ź

144

0x90

0o220





145

0x91

0o221

‘

‘

146

0x92

0o222

’

’

147

0x93

0o223

“

“

148

0x94

0o224

”

”

149

0x95

0o225

•

•

150

0x96

0o226

–

–

151

0x97

0o227

—

—

152

0x98

0o230

˜

˜

153

0x99

0o231

™

™

154

0x9a

0o232

š

š

š

155

0x9b

0o233

›

›

156

0x9c

0o234

œ

œ

ś

157

0x9d

0o235





ť

158

0x9e

0o236

ž

ž

ž

159

0x9f

0o237

Ÿ

Ÿ

ź

160

0xa0

0o240

NO-BREAK SPACE

161

0xa1

0o241

¡

Ą

ˇ

INVERTED EXCLAMATION MARK

162

0xa2

0o242

¢

˘

˘

CENT SIGN

163

0xa3

0o243

£

Ł

Ł

POUND SIGN

164

0xa4

0o244

¤

¤

¤

CURRENCY SIGN

165

0xa5

0o245

¥

Ľ

Ą

YEN SIGN

166

0xa6

0o246

¦

Ś

¦

BROKEN BAR

167

0xa7

0o247

§

§

§

SECTION SIGN

168

0xa8

0o250

¨

¨

¨

DIAERESIS

169

0xa9

0o251

©

Š

©

COPYRIGHT SIGN

170

0xaa

0o252

ª

Ş

Ş

FEMININE ORDINAL INDICATOR

171

0xab

0o253

«

Ť

«

LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

172

0xac

0o254

¬

Ź

¬

NOT SIGN

173

0xad

0o255

­

­

­

SOFT HYPHEN

174

0xae

0o256

®

Ž

®

REGISTERED SIGN

175

0xaf

0o257

¯

Ż

Ż

MACRON

176

0xb0

0o260

°

°

°

DEGREE SIGN

177

0xb1

0o261

±

ą

±

PLUS-MINUS SIGN

178

0xb2

0o262

²

˛

˛

SUPERSCRIPT TWO

179

0xb3

0o263

³

ł

ł

SUPERSCRIPT THREE

180

0xb4

0o264

´

´

´

ACUTE ACCENT

181

0xb5

0o265

µ

ľ

µ

MICRO SIGN

182

0xb6

0o266

ś

PILCROW SIGN

183

0xb7

0o267

·

ˇ

·

MIDDLE DOT

184

0xb8

0o270

¸

¸

¸

CEDILLA

185

0xb9

0o271

¹

š

ą

SUPERSCRIPT ONE

186

0xba

0o272

º

ş

ş

MASCULINE ORDINAL INDICATOR

187

0xbb

0o273

»

ť

»

RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

188

0xbc

0o274

¼

ź

Ľ

VULGAR FRACTION ONE QUARTER

189

0xbd

0o275

½

˝

˝

VULGAR FRACTION ONE HALF

190

0xbe

0o276

¾

ž

ľ

VULGAR FRACTION THREE QUARTERS

191

0xbf

0o277

¿

ż

ż

INVERTED QUESTION MARK

192

0xc0

0o300

À

Ŕ

Ŕ

LATIN CAPITAL LETTER A WITH GRAVE

193

0xc1

0o301

Á

Á

Á

LATIN CAPITAL LETTER A WITH ACUTE

194

0xc2

0o302

Â

Â

Â

LATIN CAPITAL LETTER A WITH CIRCUMFLEX

195

0xc3

0o303

Ã

Ă

Ă

LATIN CAPITAL LETTER A WITH TILDE

196

0xc4

0o304

Ä

Ä

Ä

LATIN CAPITAL LETTER A WITH DIAERESIS

197

0xc5

0o305

Å

Ĺ

Ĺ

LATIN CAPITAL LETTER A WITH RING ABOVE

198

0xc6

0o306

Æ

Ć

Ć

LATIN CAPITAL LETTER AE

199

0xc7

0o307

Ç

Ç

Ç

LATIN CAPITAL LETTER C WITH CEDILLA

200

0xc8

0o310

È

Č

Č

LATIN CAPITAL LETTER E WITH GRAVE

201

0xc9

0o311

É

É

É

LATIN CAPITAL LETTER E WITH ACUTE

202

0xca

0o312

Ê

Ę

Ę

LATIN CAPITAL LETTER E WITH CIRCUMFLEX

203

0xcb

0o313

Ë

Ë

Ë

LATIN CAPITAL LETTER E WITH DIAERESIS

204

0xcc

0o314

Ì

Ě

Ě

LATIN CAPITAL LETTER I WITH GRAVE

205

0xcd

0o315

Í

Í

Í

LATIN CAPITAL LETTER I WITH ACUTE

206

0xce

0o316

Î

Î

Î

LATIN CAPITAL LETTER I WITH CIRCUMFLEX

207

0xcf

0o317

Ï

Ď

Ď

LATIN CAPITAL LETTER I WITH DIAERESIS

208

0xd0

0o320

Ð

Đ

Đ

LATIN CAPITAL LETTER ETH

209

0xd1

0o321

Ñ

Ń

Ń

LATIN CAPITAL LETTER N WITH TILDE

210

0xd2

0o322

Ò

Ň

Ň

LATIN CAPITAL LETTER O WITH GRAVE

211

0xd3

0o323

Ó

Ó

Ó

LATIN CAPITAL LETTER O WITH ACUTE

212

0xd4

0o324

Ô

Ô

Ô

LATIN CAPITAL LETTER O WITH CIRCUMFLEX

213

0xd5

0o325

Õ

Ő

Ő

LATIN CAPITAL LETTER O WITH TILDE

214

0xd6

0o326

Ö

Ö

Ö

LATIN CAPITAL LETTER O WITH DIAERESIS

215

0xd7

0o327

×

×

×

MULTIPLICATION SIGN

216

0xd8

0o330

Ø

Ř

Ř

LATIN CAPITAL LETTER O WITH STROKE

217

0xd9

0o331

Ù

Ů

Ů

LATIN CAPITAL LETTER U WITH GRAVE

218

0xda

0o332

Ú

Ú

Ú

LATIN CAPITAL LETTER U WITH ACUTE

219

0xdb

0o333

Û

Ű

Ű

LATIN CAPITAL LETTER U WITH CIRCUMFLEX

220

0xdc

0o334

Ü

Ü

Ü

LATIN CAPITAL LETTER U WITH DIAERESIS

221

0xdd

0o335

Ý

Ý

Ý

LATIN CAPITAL LETTER Y WITH ACUTE

222

0xde

0o336

Þ

Ţ

Ţ

LATIN CAPITAL LETTER THORN

223

0xdf

0o337

ß

ß

ß

LATIN SMALL LETTER SHARP S

224

0xe0

0o340

à

ŕ

ŕ

LATIN SMALL LETTER A WITH GRAVE

225

0xe1

0o341

á

á

á

LATIN SMALL LETTER A WITH ACUTE

226

0xe2

0o342

â

â

â

LATIN SMALL LETTER A WITH CIRCUMFLEX

227

0xe3

0o343

ã

ă

ă

LATIN SMALL LETTER A WITH TILDE

228

0xe4

0o344

ä

ä

ä

LATIN SMALL LETTER A WITH DIAERESIS

229

0xe5

0o345

å

ĺ

ĺ

LATIN SMALL LETTER A WITH RING ABOVE

230

0xe6

0o346

æ

ć

ć

LATIN SMALL LETTER AE

231

0xe7

0o347

ç

ç

ç

LATIN SMALL LETTER C WITH CEDILLA

232

0xe8

0o350

è

č

č

LATIN SMALL LETTER E WITH GRAVE

233

0xe9

0o351

é

é

é

LATIN SMALL LETTER E WITH ACUTE

234

0xea

0o352

ê

ę

ę

LATIN SMALL LETTER E WITH CIRCUMFLEX

235

0xeb

0o353

ë

ë

ë

LATIN SMALL LETTER E WITH DIAERESIS

236

0xec

0o354

ì

ě

ě

LATIN SMALL LETTER I WITH GRAVE

237

0xed

0o355

í

í

í

LATIN SMALL LETTER I WITH ACUTE

238

0xee

0o356

î

î

î

LATIN SMALL LETTER I WITH CIRCUMFLEX

239

0xef

0o357

ï

ď

ď

LATIN SMALL LETTER I WITH DIAERESIS

240

0xf0

0o360

ð

đ

đ

LATIN SMALL LETTER ETH

241

0xf1

0o361

ñ

ń

ń

LATIN SMALL LETTER N WITH TILDE

242

0xf2

0o362

ò

ň

ň

LATIN SMALL LETTER O WITH GRAVE

243

0xf3

0o363

ó

ó

ó

LATIN SMALL LETTER O WITH ACUTE

244

0xf4

0o364

ô

ô

ô

LATIN SMALL LETTER O WITH CIRCUMFLEX

245

0xf5

0o365

õ

ő

ő

LATIN SMALL LETTER O WITH TILDE

246

0xf6

0o366

ö

ö

ö

LATIN SMALL LETTER O WITH DIAERESIS

247

0xf7

0o367

÷

÷

÷

DIVISION SIGN

248

0xf8

0o370

ø

ř

ř

LATIN SMALL LETTER O WITH STROKE

249

0xf9

0o371

ù

ů

ů

LATIN SMALL LETTER U WITH GRAVE

250

0xfa

0o372

ú

ú

ú

LATIN SMALL LETTER U WITH ACUTE

251

0xfb

0o373

û

ű

ű

LATIN SMALL LETTER U WITH CIRCUMFLEX

252

0xfc

0o374

ü

ü

ü

LATIN SMALL LETTER U WITH DIAERESIS

253

0xfd

0o375

ý

ý

ý

LATIN SMALL LETTER Y WITH ACUTE

254

0xfe

0o376

þ

ţ

ţ

LATIN SMALL LETTER THORN

3.4.11. Further Reading

3.4.12. References

1

redhotwords.com. Windows 2000 Notepad. http://redhotwords.com/assets/Uninotepadunicode.png

2

digitalcitizen.life. Windows 10 Notepad. https://www.digitalcitizen.life/sites/default/files/gdrive/windows_notepad/notepad_10.png

3

https://docs.microsoft.com/en-us/windows/whats-new/whats-new-windows-10-version-1903

4(1,2)

https://www.oreilly.com/library/view/c/9781482214512/K21756_A002.xhtml

5

http://www.gammon.com.au/unicode/gbk.svg.png

6

http://cdn.ilovefreesoftware.com/wp-content/uploads/2016/10/unicode-Character-list-1.png

7

https://camo.githubusercontent.com/7806142e30089cac76da9fe9fb1c5bbd0521cde6/68747470733a2f2f692e696d6775722e636f6d2f7a414d74436a622e706e67

8

https://i.pinimg.com/736x/12/e2/37/12e237271c063313762fcafa1cf58e39--web-development-tools.jpg

9

https://www.keepandshare.com/userpics/r/o/b/e/rt/2019-12/sb/screen_shot_2019_12_01_at_3.26.20_pm-34867850.jpg?ts=1575242835

10

Antibody Software. Wizkey Unicode Browser. Year: 2022. Retrieved: 2022-06-13. URL: https://antibody-software.com/images/wizkey_unicode_browser.png