Python Basics

APPENDIX A Python Basics Python, like most programming languages, has certain behaviors that can confuse anyone who is new to the language. This appendix contains an overview of the Python features that are most important to understand for anyone who wants to create Django applications and who is already familiar with another programming language (e.g., Ruby, PHP). In this appendix you’ll learn about Python strings, Unicode, and other annoying text behaviors; Python methods and how to use them with default, optional, *args, and **kwargs arguments; Python classes and subclasses; Python loops, iterators, and generators; Python list comprehensions, generator expressions, maps, and filters; as well as how to use the Python lambda keyword for anonymous methods. Strings, Unicode, and Other Annoying Text Behaviors Working with text is so common in web applications that you’ll eventually be caught by some of the not-so-straightforward ways Python interprets it. First off, beware there are considerable differences in how Python 3 and Python 2 work with strings. Python 3 provides an improvement over Python 2, in the sense there are just two instead of three ways to interpret strings. But still, it’s important to know what’s going on behind the scenes in both versions so you don’t get caught off-guard working with text. Listing A-1 illustrates a series of string statements run in Python 2 to showcase this Python version’s text behavior. Listing A-1. Python 2 literal unicode and strings Python 2.7.3 (default, Apr 10 2013, 06:20:15) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'ascii' >>> 'café & pâtisserie' 'caf\xc3\xa9 & p\xc3\xa2tisserie' >>> print('\xc3\xa9') é >>> print('\xc3\xa2') â © Daniel Rubio 2017 567 D. Rubio, Beginning Django, https://doi.org/10.1007/978-1-4842-2787-9 APPENDIX A ■ PYTHOn BaSiCS The first action in Listing A-1 shows the default Python encoding that corresponds to ASCII and which is the default for all Python 2.x versions. In theory, this means Python is limited to representing 128 characters, which are the basic letters and characters used by all computers – see any ASCII table for details.1 This is just in theory though, because you won’t get an error when attempting to input a non-ASCII character in Python. If you create a string statement with non-ASCII characters like 'café & pâtisserie', you can see in Listing A-1 the é character is output to \xc3\xa9 and the â character is output to \xc3\xa2. These outputs, which appear to be gibberish, are actually literal Unicode or UTF-8 representations of the é and â characters, respectively. So take note that even though the default Python 2 encoding is ASCII, non-ASCII characters are converted to literal Unicode or UTF-8 representations. Next in Listing A-1 you can see that using the print() statement on either of these character sequences outputs the expected é or â characters. Behind the scenes, Python 2 offers the convenience of inputting non- ASCII characters in an ASCII encoding environment, by automatically encoding strings into literal Unicode or UTF-8 representations. To confirm this behavior, you can use the decode() method, as illustrated in Listing A-2. Listing A-2. Python 2 decode unicode and u” prefixed strings Python 2.7.3 (default, Apr 10 2013, 06:20:15) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 'café & pâtisserie'.decode('utf-8') # Outputs: u'caf\xe9 & p\xe2tisserie' >>> print(u'\xe9') # Outputs: é >>> print(u'\xe2') # Outputs: â In Listing A-2 you can see the statement 'café & pâtisserie'.decode('utf-8') outputs u'caf\xe9 & p\xe2tisserie'. So now the same string decoded from Unicode or UTF-8 converts the é character or \ xc3\xa9 sequence to \xe9 and the â character or \xc3\xa2 sequence to \xe2. More importantly, notice the output string in Listing A-2 is now preceded by a u to indicate a Unicode or UTF-8 string. Therefore the é character can really be represented by both \xc3\xa9 and \xe9, it's just that \xc3\xa9 is the literal Unicode or UTF-8 representation and \xe9 is a Unicode or UTF-8 character, representation. The same case applies for the â character or any other non-ASCII character. The way Python 2 distinguishes between the two representations is by appending a u to the string. In Listing A-2 you can see calling print(u'\xe9') - note the preceding u - outputs the expected é and calling print(u'\xe2') outputs the expected â. This Python 2 convenience of allowing non-ASCII characters in an ASCII encoding environments works so long as you don’t try to forcibly convert a non-ASCII string that’s already loaded into Python into ASCII, a scenario that’s presented in Listing A-3. Listing A-3. Python 2 UnicodeEncodeError: ‘ascii’ codec can’t encode character Python 2.7.3 (default, Apr 10 2013, 06:20:15) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 'café & pâtisserie'.decode('utf-8').encode('ascii') 1https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html 568 APPENDIX A ■ PYTHOn BaSiCS Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128) In Listing A-3 you can see the call 'café & pâtisserie'.decode('utf-8').encode('ascii') throws the UnicodeEncodeError error. Here you’re not getting any convenience behavior – like when you input non-ASCII characters – because you’re trying to process an already Unicode or UTF-8 character (i.e., \xe9 or \xe2) into ASCII, so Python rightfully tells you it doesn’t know how to treat characters that are outside of ASCII’s 128-character range. You can, of course, force ASCII output on non-ASCII characters, but you’ll need to pass an additional argument to the encode() method as illustrated in Listing A-4. Listing A-4. Python 2 encode arguments to process Unicode to ASCII Python 2.7.3 (default, Apr 10 2013, 06:20:15) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 'café & pâtisserie'.decode('utf-8').encode('ascii','replace') # Outputs: 'caf? & p?tisserie' >>> 'café & pâtisserie'.decode('utf-8').encode('ascii','ignore') # Outputs: 'caf & ptisserie' >>> 'café & pâtisserie'.decode('utf-8').encode('ascii','xmlcharrefreplace') # Outputs: 'café & pâtisserie' >>> 'café & pâtisserie'.decode('utf-8').encode('ascii','backslashreplace') # Outputs: 'caf\\xe9 & p\\xe2tisserie' As you can see in Listing A-4, you can pass a second argument to the encode() method to handle non- ASCII characters: the replace argument so the output uses ? for non-ASCII characters; the ignore argument to simply bypass any non-ASCII positions; the xmlcharrefreplace to output the XML entity representation of the non-ASCII characters; or the backslashreplace to add a backlash allowing the output of an escaped non-ASCII reference. Finally, Listing A-5 illustrates how you can create Unicode strings in Python 2 by prefixing them with the letter u. Listing A-5. Python 2 Unicode strings prefixed with u'' Python 2.7.3 (default, Apr 10 2013, 06:20:15) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> u'café & pâtisserie' u'caf\xe9 & p\xe2tisserie' >>> print(u'caf\xe9 & p\xe2tisserie') café & pâtisserie In Listing A-5 you can see the u'café & pâtisserie' statement. By appending the u to the string you’re telling Python it’s a Unicode or UTF-8 string, so the output for the characters é and â are \xe9 and \ xe2, respectively. And by calling the print statement on the output for this type of string preceded by u, the output contains the expected é and â letters. Now let’s explore how Python 3 works with unicode and strings in Listing A-6. 569 APPENDIX A ■ PYTHOn BaSiCS Listing A-6. Python 3 unicode and string Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'utf-8' >>> 'café & pâtisserie' 'café & pâtisserie' As you can see in Listing A-6, the encoding is UTF-8 or Unicode, which is the default for all Python 3.x versions. By using UTF-8 or Unicode as the default, it makes working with text much simpler. There’s no need to worry or deal with how special characters are handled; everything is handled as UTF-8 or Unicode. In addition, because the default is Unicode or UTF-8, the leading u on strings is irrelevant and not supported in Python 3. Next, let’s move on to explore the use of Python’s escape character and strings. In Python, the backslash \ character is Python’s escape character and is used to escape the special meaning of a character and declare it as a literal value. For example, to use an apostrophe quote in a string delimited by quotes, you would need to escape the apostrophe quote so Python doesn’t confuse where the string ends (e.g., 'This is Python\'s "syntax"'). A more particular case of using Python’s backslash is on those special characters that use a backslash themselves. Listing A-7 illustrates various strings that use characters composed of a backslash so you can see this behavior. Listing A-7.

Python Basics

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support