0

I have been trying to use a C# Regex unsuccessfully to remove certain strings from a movie name.

Examples of the file names I'm working with are:

EuroTrip (2004) [SD]

Event Horizon (1997) [720]

Fast & Furious (2009) [1080p]

Star Trek (2009) [Unknown]

I'd like to remove anything in square brackets or parenthesis (including the brackets themselves)

So far I'm using:

movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([*\\(\\d{4}\\)])", "");

Which seems to remove the Year and Parenthesis ok, but I just can't figure out how to remove the Square Brackets and content without affecting other parts... I've had miscellaneous results but the closest one has been:

movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([?\\[+A-Z+\\]])", "");

Which left me with:

urorip (2004)

Instead of:

EuroTrip (2004) [SD]

Any whitespace that is left at the ends are ok as I will just perform

movieTitleToFetch = movieTitleToFetch.Trim();

at the end.

Thanks in advance,

Alex

4

7 回答 7

3

这个正则表达式模式应该可以正常工作......也许需要一些调整

"[\[\(].+?[\]\)]"

Regex.Replace(movieTitleToFetch, @"[\[\(].+?[\]\)]", "");

这应该匹配从“[”或“(”直到下一次出现“]”或“)”的任何内容

如果这不起作用,请尝试删除括号的转义字符,就像这样......

Regex.Replace(movieTitleToFetch, @"[\[(].+?[\])]", "");
于 2011-02-15T12:28:29.640 回答
1

@Craigt 几乎是正确的,但确保括号匹配可能更干净。

([\[].*?[\]]|[\(].*?[\)]) 
于 2011-02-15T12:33:27.033 回答
1

我知道我在这个线程上迟到了,但我写了一个简单的算法来清理下载的电影文件名。

这将运行以下步骤:

  1. 删除括号中的所有内容(如果找到一年,它会尝试保留信息)
  2. 删除常用词列表(720p、bdrip、h264 等...)
  3. 假设可以是标题中的语言信息,并在剩余字符串的末尾(特殊词之前)将其删除
  4. 如果括号中未找到年份,则查看剩余字符串的末尾(对于语言)

这样做会替换点和空格,因此标题已准备好,例如,作为搜索 api 的查询。

这是 XUnit 中的测试(我使用了大多数意大利标题来测试它)

using Grappachu.Movideo.Core.Helpers.TitleCleaner;
using SharpTestsEx;
using Xunit;

namespace Grappachu.MoVideo.Test
{
    public class TitleCleanerTest
    {
        [Theory]
        [InlineData("Avengers.Confidential.La.Vedova.Nera.E.Punisher.2014.iTALiAN.Bluray.720p.x264 - BG.mkv",
            "Avengers Confidential La Vedova Nera E Punisher", 2014)]
        [InlineData("Fuck You, Prof! (2013) BDRip 720p HEVC ITA GER AC3 Multi Sub PirateMKV.mkv",
            "Fuck You, Prof!", 2013)]
        [InlineData("Il Libro della Giungla(2016)(BDrip1080p_H264_AC3 5.1 Ita Eng_Sub Ita Eng)by siste82.avi",
            "Il Libro della Giungla", 2016)]
        [InlineData("Il primo dei bugiardi (2009) [Mux by Little-Boy]", "Il primo dei bugiardi", 2009)]
        [InlineData("Il.Viaggio.Di.Arlo-The.Good.Dinosaur.2015.DTS.ITA.ENG.1080p.BluRay.x264-BLUWORLD",
            "il viaggio di arlo", 2015)]
        [InlineData("La Mafia Uccide Solo D'estate 2013 .avi",
            "La Mafia Uccide Solo D'estate", 2013)]
        [InlineData("Ip.Man.3.2015.iTA.AC3.5.1.448.Chi.Aac.BluRay.m1080p.x264.Sub.[scambiofile.info].mkv",
            "Ip Man 3", 2015)]
        [InlineData("Inferno.2016.BluRay.1080p.AC3.ITA.AC3.ENG.Subs.x264-WGZ.mkv",
            "Inferno", 2016)]
        [InlineData("Ghostbusters.2016.iTALiAN.BDRiP.EXTENDED.XviD-HDi.mp4",
            "Ghostbusters", 2016)]
        [InlineData("Transcendence.mkv", "Transcendence", null)]
        [InlineData("Being Human (Forsyth, 1994).mkv", "Being Human", 1994)]
        public void Clean_should_return_title_and_year_when_possible(string filename, string title, int? year)
        {
            var res = MovieTitleCleaner.Clean(filename);

            res.Title.ToLowerInvariant().Should().Be.EqualTo(title.ToLowerInvariant());
            res.Year.Should().Be.EqualTo(year);
        }
    }
}

和第一个版本的代码

using System;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions; 

namespace Grappachu.Movideo.Core.Helpers.TitleCleaner
{
    public class MovieTitleCleanerResult
    {
        public string Title { get; set; }
        public int? Year { get; set; }
        public string SubTitle { get; set; }
    }

    public class MovieTitleCleaner
    {
        private const string SpecialMarker = "§=§";
        private static readonly string[] ReservedWords;
        private static readonly string[] SpaceChars;
        private static readonly string[] Languages;

        static MovieTitleCleaner()
        {
            ReservedWords = new[]
            {
                SpecialMarker, "hevc", "bdrip", "Bluray", "x264", "h264", "AC3", "DTS", "480p", "720p", "1080p"
            };
            var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
            var l = cultures.Select(x => x.EnglishName).ToList();
            l.AddRange(cultures.Select(x => x.ThreeLetterISOLanguageName));
            Languages = l.Distinct().ToArray();


            SpaceChars = new[] {".", "_", " "};
        }


        public static MovieTitleCleanerResult Clean(string filename)
        {
            var temp = Path.GetFileNameWithoutExtension(filename);
            int? maybeYear = null;

            // Remove what's inside brackets trying to keep year info.
            temp = RemoveBrackets(temp, '{', '}', ref maybeYear);
            temp = RemoveBrackets(temp, '[', ']', ref maybeYear);
            temp = RemoveBrackets(temp, '(', ')', ref maybeYear);

            // Removes special markers (codec, formats, ecc...)
            var tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
            var title = string.Empty;
            for (var i = 0; i < tokens.Length; i++)
            {
                var tok = tokens[i];
                if (ReservedWords.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
                {
                    if (title.Length > 0)
                        break;
                }
                else
                {
                    title = string.Join(" ", title, tok).Trim();
                }
            }
            temp = title;

            // Remove languages infos when are found before special markers (should not remove "English" if it's inside the title)
            tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
            for (var i = tokens.Length - 1; i >= 0; i--)
            {
                var tok = tokens[i];
                if (Languages.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
                    tokens[i] = string.Empty;
                else
                    break;
            }
            title = string.Join(" ", tokens).Trim();


            // If year is not found inside parenthesis try to catch at the end, just after the title
            if (!maybeYear.HasValue)
            {
                var resplit = title.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
                var last = resplit.Last();
                if (LooksLikeYear(last))
                {
                    maybeYear = int.Parse(last);
                    title = title.Replace(last, string.Empty).Trim();
                }
            }


            // TODO: review this. when there's one dash separates main title from subtitle 
            var res = new MovieTitleCleanerResult();
            res.Year = maybeYear;
            if (title.Count(x => x == '-') == 1)
            {
                var sp = title.Split('-');
                res.Title = sp[0];
                res.SubTitle = sp[1];
            }
            else
            {
                res.Title = title;
            }


            return res;
        }

        private static string RemoveBrackets(string inputString, char openChar, char closeChar, ref int? maybeYear)
        {
            var str = inputString;
            while (str.IndexOf(openChar) > 0 && str.IndexOf(closeChar) > 0)
            {
                var dataGraph = str.GetBetween(openChar.ToString(), closeChar.ToString());
                if (LooksLikeYear(dataGraph))
                {
                    maybeYear = int.Parse(dataGraph);
                }
                else
                {
                    var parts = dataGraph.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
                    foreach (var part in parts)
                        if (LooksLikeYear(part))
                        {
                            maybeYear = int.Parse(part);
                            break;
                        }
                }
                str = str.ReplaceBetween(openChar, closeChar, string.Format(" {0} ", SpecialMarker));
            }
            return str;
        }

        private static bool LooksLikeYear(string dataRound)
        {
            return Regex.IsMatch(dataRound, "^(19|20)[0-9][0-9]");
        }
    }


    public static class StringUtils
    {
        public static string GetBetween(this string src, string a, string b,
            StringComparison comparison = StringComparison.Ordinal)
        {
            var idxStr = src.IndexOf(a, comparison);
            var idxEnd = src.IndexOf(b, comparison);
            if (idxStr >= 0 && idxEnd > 0)
            {
                if (idxStr > idxEnd)
                    Swap(ref idxStr, ref idxEnd);
                return src.Substring(idxStr + a.Length, idxEnd - idxStr - a.Length);
            }
            return src;
        }

        private static void Swap<T>(ref T idxStr, ref T idxEnd)
        {
            var temp = idxEnd;
            idxEnd = idxStr;
            idxStr = temp;
        }

        public static string ReplaceBetween(this string s, char begin, char end, string replacement = null)
        {
            var regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
            return regex.Replace(s, replacement ?? string.Empty);
        }
    }
}
于 2017-09-06T17:49:17.987 回答
0

我们不能用这个代替:-

if(movieTitleToFetch.Contains("("))
         movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));

上面的代码肯定会为这些字符串返回完美的电影标题:-

欧洲旅行 (2004) [标清]

事件视界 (1997) [720]

速度与激情 (2009) [1080p]

星际迷航 (2009) [未知]

如果发生这种情况,您将没有年份,而只能输入 ie :-

欧洲之旅 [标清]

事件视界 [720]

速度与激情 [1080p]

星际迷航 [未知]

然后用这个

if(movieTitleToFetch.Contains("("))
         movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));
else if(movieTitleToFetch.Contains("["))
         movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("["));
于 2011-02-15T12:31:12.240 回答
0

这可以解决问题:

@"(\[[^\]]*\])|(\([^\)]*\))"

它删除从“[”到下一个“]”的任何内容以及从“(”到下一个“)”的任何内容。

于 2011-02-15T12:33:11.073 回答
0

你可以使用:

string MovieTitle="Star Trek (2009) [Unknown]";
movieTitleToFetch= MovieTitle.IndexOf('(')>MovieTitle.IndexOf('[')?
                    MovieTitle.Substring(0,MovieTitle.IndexOf('[')):
                    MovieTitle.Substring(0,MovieTitle.IndexOf('('));
于 2011-02-15T12:33:59.730 回答
0

我想出了与.+\s(?<year>\(\d{4}\))\s(?<format>\[\w+\])您的任何示例匹配的方法,并包含命名捕获组的年份和格式,以帮助您替换它们。

此模式翻译为:

任何字符,一个或多个重复
空白
字面量 '(' 后跟 4 位数字,后跟字面量 ')' (年份)
空白
字面量 '[' 后跟字母数字,一个或多个重复项,后跟字面量 ']' (格式)

于 2011-02-15T12:39:55.087 回答