[Swift 4.0] Strings and Characters

IOS/Swift 2017. 1. 22. 16:21

Introduction
: 스위프트의 String 타입은 Foundation 프레임워크의 NSString 클래스의 bridged 타입이다.
Foundation 프레임워크는 NSString 클래스의 public 메서드를 String 클래스에서도 그대로 사용할 수 있게 만들었다.(스위프트 3.0.1 부터 가능)

String Literals
: "" 로 둘러쌓여 있는 문자는 String Literal로 간주한다.

let someString = "Some string literal value"

Initializing an Empty String
: variable에 empty string을 만드는 방법은 두가지가 있다.

var emptyString = ""
var anotherEmptyString = String()

if (emptyString.isEmpty) {
print("Nothing to see here")
}
// 위 두 변수 모두 empty string 으로 간주되며, 'isEmpty'를 통해 empty 체크된다.

String Mutability
: String의 mutability는 var, let 중 어느것으로 선언되었느냐에 따라 달라진다.

var variableString = "Horse"
variableString += " and carriage"
// variableString is now "Horse and carriage"

let constantString = "Highlander"
constantString += " and another highlander"
// 컴파일 에러 발생, let으로 선언된 String은 수정이 불가능 하다.

Strings Are Value Types
: String 타입은 구조체이다. 즉, reference 타입이 아니라 value 타입이다. 그래서 String이 다른 함수나 메서드에 넘겨지거나 다른 변수, 상수에 할당될 때 값이 복사된다.
: 스위프트 컴파일러는 이러한 복사 동작을 최적화 해서 꼭 복사가 필요한 경우에만 복사한다.

let a = "abc"
var b = a
// 아직 복사가 이루어 지지 않는다. a, b 모두 같은 String 구조체를 가리킨다.

b += "def"
// 이때 복사가 이루어 진다. 즉, 기존 문자 리터럴이 변경되지 않는것을 보장하지 못할 경우에만 복사가 이루어 진다.

Working with Characters
: String의 characters property를 통해 각 character를 iterating할 수 있다.

for character in "Dog!".characters {
print(character)
}

character 타입의 상수, 변수를 선언할 수 있다.

let exclamationMark: Character = "!"
// let exclamationMark = "!" 처럼 자료형을 명시해주지 않으면 String으로 타입추론됨.

String은 Character의 Array로 만들어질 수 있다.

let catCharacters: [Character] = ["C", "a", "t", "!"]
let catString = String(catCharacters)
print(catString)

Concatenating Strings and Characters

let string1 = "hello"
let string2 = " there"
var welcome = string1 + string2
// welcome now equals "hello there"

var instruction = "look over"
instruction += string2
// instruction now equals "look over there"

Character를 덧붙이는 것도 가능하다.

let exclamationMark: Character = "!"
welcome.append(exclamationMark)
//welcome now equals "hello there!"

String Interpolation

let multiplier = 3
let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)"
// message is "3 times 2.5 is 7.5"

Unicode
: Unicode는 text의 인코딩, 표현, 처리에 관한 국제 표준이다. 스위프트의 String, Character는 fully Unicode-compliant 타입이다.

Unicode Scalars
: 스위프트의 String 타입은 Unicode scalar value들로 이루어 진다. Unicode scalar는 21 bit의 unique number 값을 갖는다.

ex) U+0061은 "a"(LATIN SMALL LETTER A)를 나타낸다.

Special Characters in String Literals
: String literals은 special characters를 포함할 수 있다.

- The escaped special characters
\0 (null character), \\ (backslash), \t (horizontal tab), \n(line feed), \r (carriage return),
\" (double quote), \' (single quote)
- An arbitrary Unicode scalar
\u{n} n은 16진수다.

let wiseWords = "\"Imagination is more important than knowledge\" - Einstein"
// "Imagination is more important than knowledge" - Einstein

let dollarSign = "\u{24}"
// $, Unicode scalar U+0024
let blackHeart = "\u{2665}"
// ♥, Unicode scalar U+2665
let sparklingHeart = "\u{1F496}"
// 💖, Unicode scalar U+1F496

Extended Grapheme Clusters
: Grapheme은 문자소를 뜻한다. 문자소란 의미를 나타내는 최소문자 단위이다.(ex. 'ㄴ', 'ㄹ', 'ㅏ')
: Extended Grapheme Clusters란 하나이상의 문자소가 합쳐져서 만들어진 human-readable character정도로 이해하면 되겠다. Character 타입은 Extended Grapheme Clusters를 표현하는 타입이라고 생각하면 된다.

let precomposed: Character = "\u{D55C}"
// '한'
let decomposed: Character = "\u{1112}\u{1161}\u{11AB}"
// 'ㅎ' 'ㅏ' 'ㄴ' 이 합쳐져서 '한'이 된다.
// (precomposed == decomposed) 비교하면 true가 return 된다.

Counting Characters

: String의 프로퍼티인 characters의 프로퍼티인 count로 Characters의 갯수를 얻을 수 있다.
: string concatenation, modification이 string의 count의 항상 영향을 미치치 않는다. extended grapheme clusters를 통해서 여러개의 Unicode Scalars가 더해져서 하나의 Character를 표현할 수 있기 때문이다. 아래 예제를 보자.

var word = "cafe"
print("the number of characters in \(word) is \(word.characters.count)")
// Prints "the number of characters in cafe is 4"

word += "\u{301}"
// '`', COMBINING ACUTE ACCENT, U+0301

print("the number of characters in \(word) is \(word.characters.count)")
// Prints "the number of characters in café" is 4"

*Extended grapheme Clusters는 하나 이사의 Unicode scalars로 이루어져 있다. 그러므로 스위프트의 Character 타입은 똑같은 문자를 표현하고 있다하더라도 저장 메모리 공간은 다를수 있다. 이것은 String의 갯수를 측정할 때, 단순의 메모리의 크기로 측정할 수 없다는것을 뜻한다. 즉, String의 count를 구하기 위해서는 String의 프로퍼티인 characters를 Extended grapheme Clusters 단위로 iterating 해서 구해야 한다. 반대로 Objective-C의 NSString은 UTF-16(16비트 기준)로 문자를 표현하기 때문에 메모리 크기로 NSString의 length를 계산할 수 있다.
즉, 이러한 이유로 똑같은 문자열을 표현하더라도 스위프트의 String.characters.count와 NSString의 length 프로퍼티의 값이 다를 수 있다.

Accessing and Modifying a String
: 스위프트에서는 method, 프로퍼티들 그리고 subscript를 통해서 String을 접근 및 변경할 수 있다.

String Indices
: String은 index type을 갖는데, 이는 String 안에 있는 각 Character의 Position과 연관된다.
위에도 언급했듯이 Character는 같은 문자라 하더라도 다른 메모리 공간을 사용할 수 있어서 String은 integer 값으로 indexing 하지 않는다.

let greeting = "Guten Tag!"
greeting[greeting.startIndex]
// G
greeting[greeting.index(before: greeting.endIndex)]
// !
greeting[greeting.index(after: greeting.startIndex)]
// u
let index = greeting.index(greeting.startIndex, offsetBy: 7)
greeting[index]
// a

범위 밖에 있는 index를 접근하려고 하면 runtime error가 발생한다.

greeting[greeting.index] //Error
greeting.index(after: greeting.endIndex) //Error

characters의 indices 프로퍼티를 접근해서 String 안에 있는 모든 index에 접근할 수 있다.

for index in greeting.characters.indices {
print("\(greeting[index]) ", terminator: "")
}
// Prints "G u t e n T a g ! "

* 위에 소개된 startIndex, endIndex 프로퍼티와 index(before:), index(after:), index(_:offSetBy:) 메서드는 Collection 프로토콜을 따르는 타입은 모두 사용할 수 있다. 즉, String 이외에 Array, Dictionary 그리고 Set 에서도 사용 가능하다.

Inserting and Removing
: insert(_: at:), insert(contentsOf:at:) 메서드를 통해 문자나 문자열을 삽입할 수 있다.
: remove(at:), removeSubrange(_:) 메서드를 통해 문자나 문자열을 제거할 수 있다.

var welcome = "hello"

welcome.insert("!", at: welcome.endIndex)
// welcome = "hello!"

welcome.insert(contentsOf:" there".characters, at:welcome.index(before: welcome.endIndex))
// welcome = "hello there!"

welcome.remove(at: welcome.index(before: welcome.endIndex))
//welcome now equals "hello there"

let range = welcome.index(welcome.endIndex, offsetBy: -6) ..< welcome.endIndex

welcome.removeSubrange(range)
// welcome now equals "hello"

* 위에 소개된 insert(_: at:), insert(contentsOf:at:), remove(at:), removeSubrange(_:) 메서드는 RangeReplaceableCollection 프로토콜을 따르는 타입은 모두 사용할 수 있다. 즉, String 이외에 Array, Dictionary, Set 에서도 사용 가능하다.

Comparing Strings
: 스위프트는 string, character의 비교연산을 하기위한 3가지 방법을 지원한다.

String and Character Equality

let quotation = "We're a lot alike, you and I."
let sameQuotation = "We're a lot alike, you and I."

if quotation == sameQuotation {
print("These two strings are considered equal")
}

String은 linguistic meaning과 appearance이 같으면 둘은 같다고 인식한다. 즉, 다른 Unicode scalars로 이루어져있다 하더라도 의미와 나타내는 모양이 같으면, 둘을 같은 문자열로 취급하는 것이다.

let eAcuteQuestion = "Voulez-vous un caf\u{E9}?"
let combinedEAcuteQuestion = "Voulez-vous un caf\u{65}\u{301}?"
// 두 상수 모두 "Voulez-vous un café"를 나타낸다.

if eAcuteQuestion == combinedEAcuteQuestion {
print("These two strings are considered equal")
}
// 두 문자열을 구성하는 Unicode scalars가 다르더라도 의미와 모양이 같기 때문에 같은 문자열로 취급한다.

let latinCapitalLetterA: Character = "\u{41}"
let cyrillicCapitalLetterA: Character = "\u{0410}"
// 하나는 영어 'A'이고, 다른 하나는 러시아어의 'A'이다.

if laticCapitalLetterA != cyrillicCatitalLetterA {
print("These two characters are not equivalent.")
}
// 둘은 모양은 같지만 linguistic meaning 즉, 의미가 다르기 때문에 다른 문자로 취급된다.

Prefix and Suffix Equality
: 두 문자열이 같은 prefix 또는 suffix를 가지고 있는지 확인하려면 hasPrefix(_:), hasSuffix(_:) 메서드를 사용해야 한다.

let romeoAndJuliet = [
"Act 1 Scene 1 : A street outside Capulet's mansion",
"Act 1 Scene 2 : The Great Hall in Capulet's mansion",
"Act 2 Scene 1: Outside Capulet's mansion",
"Act 2 Scene 2: Outside Friar Lawrence's cell"
]

var act1SceneCount = 0

for scene in romeoAndJuliet {
if scene.hasPrefix("Act 1 ") {
      act1SceneCount += 1
}
}
// act1SceneCount = 2

var mansionCount = 0
var cellCount = 0

for scene in romeoAndJuliet {
if scene.hasSuffix("Capulet's mansion") {
      mansionCount += 1
} else if scene.hasSuffix("Friar Lawrence' cell") {
      cellCount += 1
}
}
// mansionCount = 3, cellCount = 1

Unicode Representations of Strings
: Unicode 문자열이 text file이나 다른 저장소에 쓰여질때 encoding 될수 있는 여러개의 encoding form이 있다. 각 encoding form은 문자열을 small chunk들, 즉 code unit들로 쪼개서 encoding한다.
- UTF-8 : 8bit의 code units로 encoding (ASCII encoding과 representation이 동일하다.)
- UTF-16 : 16bit의 code units로 encoding
- UTF-32 : 32bit의 code units로 encoding (Unicode의 scalar가 21bit로 이루어져 있기 때문에 Unicode scalar 단위로 쪼개서 encoding 하려면 UTF-32를 이용해야 한다.)

let dogString = "Dog!!🐶"
// !!(Unicode scalar U+203C) , 🐶(Unicode scalar U+1F436)

위 dogString을 각각의 encoding form으로 다뤄보겠다.

UTF-8 Representation

String의 utf8 프로퍼티를 통해 각 code unit에 접근할 수 있다. utf8 프로퍼티는 String.UTF8View 타입이며, 이는 UInt8 값들의 collection이다.

for codeUnit in dogString.utf8 {
print("\(codeUnit) ", terminator: "")
}
// 68 111 103 226 128 188 240 159 144 182

UTF-16 Representation

String의 utf16 프로퍼티를 통해 각 code unit에 접근할 수 있다. utf16 프로퍼티는 String.UTF16View 타입이며, 이는 UInt16 값들의 collection이다.

for codeUnit in dogString.utf16 {
print("\(codeUnit) ", terminator: "")
}
// 68 111 103 8252 55357 56347

Unicode Scalar Representation

String의 unicodeScalars 프로퍼티를 통해 각 code unit에 접근할 수 있다. unicodeScalars 프로퍼티는 UnicodeScalarView 타입이며, 이는 UnicodeScalar의 collection이다.
각 UnicodeScalar는 21bit의 scalar 값을 가지고 있으며 value 프로퍼티로 접근할 수 있다. 이는 UInt32 타입이다.

for scalar in dogString.unicodeScalars {
print("\(scalar.value) ", terminator: "")
}
// 68 111 103 8252 128054

for scalar in dogString.unicodeScalars {
print("\(scalar) ")
}
// D o g !! 🐶

'IOS > Swift' 카테고리의 다른 글

[Swift 4.0] Functions (0)	2017.04.01
[Swift 4.0] Control Flow (0)	2017.04.01
[Swift 4.0] Collection Types (0)	2017.02.03
[Swift 4.0] Basic Operators (0)	2017.01.16
[Swift 4.0] The Basics (0)	2017.01.13

Posted by 홍성곤

Hong's Programing World

[Swift 4.0] Strings and Characters

'IOS > Swift' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바