python - 将一串路径、文件名、ext 分成三个单独的变量的最稳定和 Pythonic 的跨平台方法是什么？

Question

我试图将一个字符串分成三个变量，这些变量C:\Example\readme.txt可以读作C:\Example, readme, 并且.txt为了我正在编写的脚本。它可能同时部署在 Windows 和 Unix 环境中，并且可能同时处理 Windows 或 Unix 路径，因此我需要找到一种同时符合这两种标准的方法；我已经阅读了几个实现与此类似的函数，但我正在寻找一些关于如何最好地处理函数内的单个字符串的输入。

*注意，我在这个环境中运行 IronPython 2.6，我不确定这是否与标准 Python 2.7 变化如此之大，以至于我需要调整我的用法。

编辑：我知道使用os.path.splitext从文件名中获取扩展名，但是找到一种独立于平台的方式来获取路径和文件名（我稍后会使用splitext）让我感到困惑。

score 1 · Accepted Answer

我倾向于使用os.path模块，它有多个版本，具体取决于您运行的操作系统。但是导入os.path应该总是给你正确的。如果可以的话，您可以手动检查您正在使用的操作系统：

import platform
platform.platform()

然后从os. 但它肯定是要容易得多import os.path。

所以你会感兴趣的是：

os.path.basename(path) # To get the name of the file with extension.
os.path.basename(path).split('.')[0] # To get just the name.
os.path.dirname(path) # To get the directory leading to the file.

希望这可以帮助。

警告：我不保证这是最好的方法。

score 1 · Accepted Answer

你想要os.path.split+ os.path.splitext。下次请花点时间阅读文档，它会比在这里发布要快得多。

score 0 · Accepted Answer

我修改了我的代码以考虑到 ChrisP 的评论：

import re
from os.path import sep
rs = re.escape(sep)

basepat = ('(/?.*?)(?=%s?[^%s]*\Z)'
           '(?:%s([^.]*)(\.[^.]+)?)?\Z')

print '* On a Windows platform'
sep = '\\'
print 'sep: %s  repr(s): %r' % (sep,sep)
print 'rs = re.escape(sep)'
rs = re.escape(sep)
print 'rs: %s   repr(rs): %r' % (rs,rs)
rgx = re.compile(basepat % (rs,rs,rs))
for fn in (r'C:\Example\readme.txt',
           r'C:\Example\.txt',
           r'C:\Example\readme',
           'C:\Example\\readme\\',
           'C:\Example\\rod\pl\\',
           'C:\Example\\rod\p2',
           r'C:\Egz\rod\pl\zu.pdf',
           'C:\Example\\',
           'C:\Example',
           'C:\\'):
    m = rgx.match(fn)
    if m:  print '%-21s  %r' %(fn,m.groups(''))
    else:  print fn
print

print '\n* On a Linux platform'
sep = '/'
print 'sep: %s  repr(s): %r' % (sep,sep)
print 'rs = re.escape(sep)'
rs = re.escape(sep)
print 'rs: %s   repr(rs): %r' % (rs,rs)
rgx = re.compile(basepat % (rs,rs,rs))
for fn in ('/this/is/a/unix/folder.txt',
           '/this/is/a/unix/.txt',
           '/this/is/a/unix/folder',
           '/this/is/a/unix/folder/',
           '/this/', 
           '/this'):
    m = rgx.match(fn)
    if m:  print '%-21s  %r' %(fn,m.groups(''))
    else:  print fn

结果

* On a Windows platform
sep: \  repr(s): '\\'
rs = re.escape(sep)
rs: \\   repr(rs): '\\\\'
C:\Example\readme.txt  ('C:\\Example', 'readme', '.txt')
C:\Example\.txt        ('C:\\Example', '', '.txt')
C:\Example\readme      ('C:\\Example', 'readme', '')
C:\Example\readme\     ('C:\\Example\\readme', '', '')
C:\Example\rod\pl\     ('C:\\Example\\rod\\pl', '', '')
C:\Example\rod\p2      ('C:\\Example\\rod', 'p2', '')
C:\Egz\rod\pl\zu.pdf   ('C:\\Egz\\rod\\pl', 'zu', '.pdf')
C:\Example\            ('C:\\Example', '', '')
C:\Example             ('C:', 'Example', '')
C:\                    ('C:', '', '')


* On a Linux platform
sep: /  repr(s): '/'
rs = re.escape(sep)
rs: \/   repr(rs): '\\/'
/this/is/a/unix/folder.txt  ('/this/is/a/unix', 'folder', '.txt')
/this/is/a/unix/.txt   ('/this/is/a/unix', '', '.txt')
/this/is/a/unix/folder  ('/this/is/a/unix', 'folder', '')
/this/is/a/unix/folder/  ('/this/is/a/unix/folder', '', '')
/this/                 ('/this', '', '')
/this                  ('/this', '', '')

basepat对于上面代码中的Windos或Linux平台这两种情况是一样的。那么真正的代码将是：

import re
from os.path import sep
rs = re.escape(sep)
rgx = re.compile('(/?.*?)(?=%s?[^%s]*\Z)'
                 '(?:%s([^.]*)(\.[^.]+)?)?\Z'
                 % (rs,rs,rs))
etc...

.

#

'(.*)\\\\([^.\\\\]*)(\.[^.]+)?\Z'在这种形式下可以更容易地阅读正则表达式模式：

('(.*)'
 '\\\\'
 '([^.\\\\]*)'
 '(\.[^.]+)?'
 '\Z')

1)
(.+)表示“尽可能多的字符，这个连续的字符必须保持为 group(1)”
+是一个量词。后面没有?它，它是一个贪婪的量词->“尽可能多...”
括号是确定要保留在一个组中的字符的匹配连续的符号。由于这些括号是模式中的第一个，因此该组将编号为 1。请注意，在分析的文本中
强制存在字符串 char ，由文本中 lmast 的前面，以保持编译的正则表达式匹配的可能性，如果贪婪地匹配整个分析的字符串，情况就不会如此。\\\\\.*\.+

2）
\\\\：当re.compile()看到这个系列时，它解释为“一个字符串字符\必须在这个位置的分析字符串中”。表示正则表达式模式的字符串中需要用 4 个字符串字符
来表示字符串字符的原因比较难理解。该函数必须使用正则表达式符号。但是我们必须指定符号的唯一方法是将字符串作为参数传递给它。这就是为什么我在上面写了一个代表正则表达式模式的字符串。因为不直接处理字符串，所以它编译了一系列符号，这些符号首先从字符串模式的解释中获得。对于常见的字符，如\\
re.compile()re.compile()re.compile()
k或者;，很简单：符号表达k就是简单地用字符串字符来表示k，对于;它来说就是;.
对于*,等对正则表达式电机具有特殊意义的特殊字符，它们的表达式必须使用特殊字符+：所以将表示“一个点”，将表示“一个问号”等。但对于字符串字符，它有本身在字符串中具有特殊含义：它转义了以下字符。例如在 "\"" 中，字符串 char对 char 进行转义，以确保没有错误。问题是字符串 char自行转义。然后，如果有人会写?\\.\?
\\"
\aaa\\bbb作为正则表达式模式，双反斜杠将被解释为表示字符串 char \，而不是字符串 char 的正则表达式符号\。
然后，\\符号化字符串 char 效率低下\，这个符号化是用 4 个字符串 chars 完成的：\\\\

3)
[a4:]表示“ONE character that can be aor 4or :”
[^a4:]表示“任何字符，除了三个字符之一a，4然后表示“任何字符:
，[^.\\\\]*除了点和字符\”，星表示“前面定义的字符可以是重复或缺席。
请注意，括号之间的点会失去其特殊含义，无需转义。
由于这是在括号之间，因此匹配的连续字符[^a4:]将保存在第二个组对象中。

5)
(\.[^.]+)?表示“一个点后跟一系列字符，这些字符可以是除点之外的任何现有字符”。这里必须对模式中的点进行转义以表示一个点符号，它仅表示“一个点字符”而不是“每个字符”。
由于存在+，点，如果存在于分析的文本中，必须后跟至少一个字符。
但是在整体之后特殊意义的字符的存在?意味着这个连续点和一个点的至少一个不同的字符可以不存在。

6)
\Z表示字符串的结尾。这个符号表示每个匹配都必须关注整个分析的字符串，直到它的最后；匹配将不被接受在结束前停止。

.

编辑

import re

rgx = re.compile('(/?.*?)(?=[\\\\/]?[^\\\\/]*\Z)'
                 '(?:[\\\\/]([^.]*)(\.[^.]+)?)?\Z')

for fn in ('C:\\Example\\readme.txt',
           'C:\\Example\\.txt',
           'C:\\Example\\readme',
           'C:\\Example\\readme\\',
           'C:\\Example\\rod\pl\\',
           'C:\\Example\\rod\p2',
           'C:\\Egz\\rod\\pl\\zu.pdf',
           'C:\\Example\\',
           'C:\\Example',
           'C:\\',
           '/this/is/a/unix/folder.txt',
           '/this/is/a/unix/.txt',
           '/this/is/a/unix/folder',
           '/this/is/a/unix/folder/',
           '/this/', 
           '/this',
           '\\machine\\share\\folder',
           'c:/folder\folder2',
           'c:\\folder\\..\\folder2',
           'c:\\folder\\..\\fofo2.txt',
           'c:\\folder\\..\\ki/fofo2.txt'):
    m = rgx.match(fn)
    if m:  print '%-26s  %r' %(fn,m.groups(''))
    else:  print fn,'   ***No match***'

结果

C:\Example\readme.txt       ('C:\\Example', 'readme', '.txt')
C:\Example\.txt             ('C:\\Example', '', '.txt')
C:\Example\readme           ('C:\\Example', 'readme', '')
C:\Example\readme\          ('C:\\Example\\readme', '', '')
C:\Example\rod\pl\          ('C:\\Example\\rod\\pl', '', '')
C:\Example\rod\p2           ('C:\\Example\\rod', 'p2', '')
C:\Egz\rod\pl\zu.pdf        ('C:\\Egz\\rod\\pl', 'zu', '.pdf')
C:\Example\                 ('C:\\Example', '', '')
C:\Example                  ('C:', 'Example', '')
C:\                         ('C:', '', '')
/this/is/a/unix/folder.txt  ('/this/is/a/unix', 'folder', '.txt')
/this/is/a/unix/.txt        ('/this/is/a/unix', '', '.txt')
/this/is/a/unix/folder      ('/this/is/a/unix', 'folder', '')
/this/is/a/unix/folder/     ('/this/is/a/unix/folder', '', '')
/this/                      ('/this', '', '')
/this                       ('/this', '', '')
\machine\share\folder       ('\\machine\\share', 'folder', '')
c:/folderolder2            ('c:', 'folder\x0colder2', '')
c:\folder\..\folder2        ('c:\\folder\\..', 'folder2', '')
c:\folder\..\fofo2.txt      ('c:\\folder\\..', 'fofo2', '.txt')
c:\folder\..\ki/fofo2.txt   ('c:\\folder\\..\\ki', 'fofo2', '.txt')

python - 将一串路径、文件名、ext 分成三个单独的变量的最稳定和 Pythonic 的跨平台方法是什么？

3 回答 3

#

编辑

Related

Reference