{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "

Day 15 Regular Expressions in Python

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Recap\n", "\n", "- Modules and Packages\n", " - Builtin Packages/Modules\n", " - math, random, calendar, os\n", " - User Defined packages/ Modules\n", " \n", "### Today Objectives\n", "\n", "- Regular Expressions (Regex)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "s = \"\"\"Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Wikipedia\n", "Developer: Python Software Foundation\n", "Stable release: 3.9.5 / 3 May 2021; 29 days ago\n", "Preview release: 3.10.0b1 / 3 May 2021; 29 days ago\n", "Typing discipline: Duck, dynamic, strong typing; gradual (since 3.5, but ignored in CPython)\n", "First appeared: February 1991; 30 years ago\n", "Paradigm: Multi-paradigm: object-oriented, procedural (imperative), functional, structured, reflective\"\"\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'a', 'n', ' ', 'i', 'n', 't', 'e', 'r', 'p', 'r', 'e', 't', 'e', 'd', ' ', 'h', 'i', 'g', 'h', '-', 'l', 'e', 'v', 'e', 'l', ' ', 'g', 'e', 'n', 'e', 'r', 'a', 'l', '-', 'p', 'u', 'r', 'p', 'o', 's', 'e', ' ', 'p', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', ' ', 'l', 'a', 'n', 'g', 'u', 'a', 'g', 'e', '.', ' ', 'P', 'y', 't', 'h', 'o', 'n', \"'\", 's', ' ', 'd', 'e', 's', 'i', 'g', 'n', ' ', 'p', 'h', 'i', 'l', 'o', 's', 'o', 'p', 'h', 'y', ' ', 'e', 'm', 'p', 'h', 'a', 's', 'i', 'z', 'e', 's', ' ', 'c', 'o', 'd', 'e', ' ', 'r', 'e', 'a', 'd', 'a', 'b', 'i', 'l', 'i', 't', 'y', ' ', 'w', 'i', 't', 'h', ' ', 'i', 't', 's', ' ', 'n', 'o', 't', 'a', 'b', 'l', 'e', ' ', 'u', 's', 'e', ' ', 'o', 'f', ' ', 's', 'i', 'g', 'n', 'i', 'f', 'i', 'c', 'a', 'n', 't', ' ', 'i', 'n', 'd', 'e', 'n', 't', 'a', 't', 'i', 'o', 'n', '.', ' ', 'W', 'i', 'k', 'i', 'p', 'e', 'd', 'i', 'a', '\\n', 'D', 'e', 'v', 'e', 'l', 'o', 'p', 'e', 'r', ':', ' ', 'P', 'y', 't', 'h', 'o', 'n', ' ', 'S', 'o', 'f', 't', 'w', 'a', 'r', 'e', ' ', 'F', 'o', 'u', 'n', 'd', 'a', 't', 'i', 'o', 'n', '\\n', 'S', 't', 'a', 'b', 'l', 'e', ' ', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', ' ', '3', '.', '9', '.', '5', ' ', '/', ' ', '3', ' ', 'M', 'a', 'y', ' ', '2', '0', '2', '1', ';', ' ', '2', '9', ' ', 'd', 'a', 'y', 's', ' ', 'a', 'g', 'o', '\\n', 'P', 'r', 'e', 'v', 'i', 'e', 'w', ' ', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', ' ', '3', '.', '1', '0', '.', '0', 'b', '1', ' ', '/', ' ', '3', ' ', 'M', 'a', 'y', ' ', '2', '0', '2', '1', ';', ' ', '2', '9', ' ', 'd', 'a', 'y', 's', ' ', 'a', 'g', 'o', '\\n', 'T', 'y', 'p', 'i', 'n', 'g', ' ', 'd', 'i', 's', 'c', 'i', 'p', 'l', 'i', 'n', 'e', ':', ' ', 'D', 'u', 'c', 'k', ',', ' ', 'd', 'y', 'n', 'a', 'm', 'i', 'c', ',', ' ', 's', 't', 'r', 'o', 'n', 'g', ' ', 't', 'y', 'p', 'i', 'n', 'g', ';', ' ', 'g', 'r', 'a', 'd', 'u', 'a', 'l', ' ', '(', 's', 'i', 'n', 'c', 'e', ' ', '3', '.', '5', ',', ' ', 'b', 'u', 't', ' ', 'i', 'g', 'n', 'o', 'r', 'e', 'd', ' ', 'i', 'n', ' ', 'C', 'P', 'y', 't', 'h', 'o', 'n', ')', '\\n', 'F', 'i', 'r', 's', 't', ' ', 'a', 'p', 'p', 'e', 'a', 'r', 'e', 'd', ':', ' ', 'F', 'e', 'b', 'r', 'u', 'a', 'r', 'y', ' ', '1', '9', '9', '1', ';', ' ', '3', '0', ' ', 'y', 'e', 'a', 'r', 's', ' ', 'a', 'g', 'o', '\\n', 'P', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', ' ', 'M', 'u', 'l', 't', 'i', '-', 'p', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', ' ', 'o', 'b', 'j', 'e', 'c', 't', '-', 'o', 'r', 'i', 'e', 'n', 't', 'e', 'd', ',', ' ', 'p', 'r', 'o', 'c', 'e', 'd', 'u', 'r', 'a', 'l', ' ', '(', 'i', 'm', 'p', 'e', 'r', 'a', 't', 'i', 'v', 'e', ')', ',', ' ', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', 'a', 'l', ',', ' ', 's', 't', 'r', 'u', 'c', 't', 'u', 'r', 'e', 'd', ',', ' ', 'r', 'e', 'f', 'l', 'e', 'c', 't', 'i', 'v', 'e']\n" ] } ], "source": [ "print(list(s))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'P', 'W', 'D', 'P', 'S', 'F', 'S', 'M', 'P', 'M', 'T', 'D', 'C', 'P', 'F', 'F', 'P', 'M']\n" ] } ], "source": [ "li = []\n", "\n", "for char in s:\n", " if char.isupper():\n", " li.append(char)\n", "print(li)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[' ', ' ', ' ', ' ', '-', ' ', '-', ' ', ' ', '.', ' ', \"'\", ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.', ' ', '\\n', ':', ' ', ' ', ' ', '\\n', ' ', ':', ' ', '.', '.', ' ', '/', ' ', ' ', ' ', ';', ' ', ' ', ' ', '\\n', ' ', ':', ' ', '.', '.', ' ', '/', ' ', ' ', ' ', ';', ' ', ' ', ' ', '\\n', ' ', ':', ' ', ',', ' ', ',', ' ', ' ', ';', ' ', ' ', '(', ' ', '.', ',', ' ', ' ', ' ', ' ', ')', '\\n', ' ', ':', ' ', ' ', ';', ' ', ' ', ' ', '\\n', ':', ' ', '-', ':', ' ', '-', ',', ' ', ' ', '(', ')', ',', ' ', ',', ' ', ',', ' ']\n" ] } ], "source": [ "li = []\n", "\n", "for char in s:\n", " if not char.isalnum():\n", " li.append(char)\n", "print(li)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "565\n" ] } ], "source": [ "print(len(s))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "replace all special characters with `**`" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Wikipedia\n", "Developer: Python Software Foundation\n", "Stable release: 3.9.5 / 3 May 2021; 29 days ago\n", "Preview release: 3.10.0b1 / 3 May 2021; 29 days ago\n", "Typing discipline: Duck, dynamic, strong typing; gradual (since 3.5, but ignored in CPython)\n", "First appeared: February 1991; 30 years ago\n", "Paradigm: Multi-paradigm: object-oriented, procedural (imperative), functional, structured, reflective\n" ] } ], "source": [ "print(s)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python**is**an**interpreted**high**level**general**purpose**programming**language****Python**s**design**philosophy**emphasizes**code**readability**with**its**notable**use**of**significant**indentation****Wikipedia**Developer****Python**Software**Foundation**Stable**release****3**9**5******3**May**2021****29**days**ago**Preview**release****3**10**0b1******3**May**2021****29**days**ago**Typing**discipline****Duck****dynamic****strong**typing****gradual****since**3**5****but**ignored**in**CPython****First**appeared****February**1991****30**years**ago**Paradigm****Multi**paradigm****object**oriented****procedural****imperative******functional****structured****reflective\n" ] } ], "source": [ "s1 = \"\"\n", "\n", "for char in s:\n", " if not char.isalnum():\n", " s1 += '**'\n", " else:\n", " s1 += char\n", "\n", " \n", "print(s1)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python is an interpreted high**level general**purpose programming language** Python**s design philosophy emphasizes code readability with its notable use of significant indentation** Wikipedia\n", "Developer** Python Software Foundation\n", "Stable release** 3**9**5 ** 3 May 2021** 29 days ago\n", "Preview release** 3**10**0b1 ** 3 May 2021** 29 days ago\n", "Typing discipline** Duck** dynamic** strong typing** gradual **since 3**5** but ignored in CPython**\n", "First appeared** February 1991** 30 years ago\n", "Paradigm** Multi**paradigm** object**oriented** procedural **imperative**** functional** structured** reflective\n" ] } ], "source": [ "s1 = \"\"\n", "\n", "for char in s:\n", " if not char.isalnum() and char != ' ' and not char.isspace():\n", " s1 += '**'\n", " else:\n", " s1 += char\n", "\n", " \n", "print(s1)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['is', 'an', 'interpreted', 'high-level', 'general-purpose', 'programming', 'language.', 'design', 'philosophy', 'emphasizes', 'code', 'readability', 'with', 'its', 'notable', 'use', 'of', 'significant', 'indentation.', 'release:', 'days', 'ago', 'release:', 'days', 'ago', 'discipline:', 'dynamic,', 'strong', 'typing;', 'gradual', 'but', 'ignored', 'in', 'appeared:', 'years', 'ago', 'object-oriented,', 'procedural', 'functional,', 'structured,', 'reflective']\n" ] } ], "source": [ "ss = s.split()\n", "li = []\n", "\n", "for char in ss:\n", " if char[0].islower():\n", " li.append(char)\n", "print(li)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regular Expression\n", "\n", "a group of character/special characters for matching the pattern" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import re" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Support for regular expressions (RE).\n", "\n", "This module provides regular expression matching operations similar to\n", "those found in Perl. It supports both 8-bit and Unicode strings; both\n", "the pattern and the strings being processed can contain null bytes and\n", "characters outside the US ASCII range.\n", "\n", "Regular expressions can contain both special and ordinary characters.\n", "Most ordinary characters, like \"A\", \"a\", or \"0\", are the simplest\n", "regular expressions; they simply match themselves. You can\n", "concatenate ordinary characters, so last matches the string 'last'.\n", "\n", "The special characters are:\n", " \".\" Matches any character except a newline.\n", " \"^\" Matches the start of the string.\n", " \"$\" Matches the end of the string or just before the newline at\n", " the end of the string.\n", " \"*\" Matches 0 or more (greedy) repetitions of the preceding RE.\n", " Greedy means that it will match as many repetitions as possible.\n", " \"+\" Matches 1 or more (greedy) repetitions of the preceding RE.\n", " \"?\" Matches 0 or 1 (greedy) of the preceding RE.\n", " *?,+?,?? Non-greedy versions of the previous three special characters.\n", " {m,n} Matches from m to n repetitions of the preceding RE.\n", " {m,n}? Non-greedy version of the above.\n", " \"\\\\\" Either escapes special characters or signals a special sequence.\n", " [] Indicates a set of characters.\n", " A \"^\" as the first character indicates a complementing set.\n", " \"|\" A|B, creates an RE that will match either A or B.\n", " (...) Matches the RE inside the parentheses.\n", " The contents can be retrieved or matched later in the string.\n", " (?aiLmsux) The letters set the corresponding flags defined below.\n", " (?:...) Non-grouping version of regular parentheses.\n", " (?P...) The substring matched by the group is accessible by name.\n", " (?P=name) Matches the text matched earlier by the group named name.\n", " (?#...) A comment; ignored.\n", " (?=...) Matches if ... matches next, but doesn't consume the string.\n", " (?!...) Matches if ... doesn't match next.\n", " (?<=...) Matches if preceded by ... (must be fixed length).\n", " (? | Creates a named group when performing matches. |\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### syntax\n", "\n", "\n", "- method(pattern, string)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'P', 'P', 'P', 'P', 'P']\n" ] } ], "source": [ "print(re.findall('P', s))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pyt', 'Pyt', 'Pyt', 'Pre', 'Pyt', 'Par']\n" ] } ], "source": [ "print(re.findall('P..', s))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['an ', 'al-', 'amm', 'ang', 'age', 'asi', 'ada', 'abl', 'ant', 'ati', 'are', 'ati', 'abl', 'ase', 'ay ', 'ays', 'ago', 'ase', 'ay ', 'ays', 'ago', 'ami', 'adu', 'al ', 'app', 'are', 'ary', 'ars', 'ago', 'ara', 'ara', 'al ', 'ati', 'al,']\n" ] } ], "source": [ "print(re.findall('a..', s))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Wikipedia\n", "Developer: Python Software Foundation\n", "Stable release: 3.9.5 / 3 May 2021; 29 days ago\n", "Preview release: 3.10.0b1 / 3 May 2021; 29 days ago\n", "Typing discipline: Duck, dynamic, strong typing; gradual (since 3.5, but ignored in CPython)\n", "First appeared: February 1991; 30 years ago\n", "Paradigm: Multi-paradigm: object-oriented, procedural (imperative), functional, structured, reflective\n" ] } ], "source": [ "print(s)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pyt']\n" ] } ], "source": [ "print(re.findall('^P..', s))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Py']\n" ] } ], "source": [ "print(re.findall('\\A..', s))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['ive']\n" ] } ], "source": [ "print(re.findall('..e$', s))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['ctive']\n" ] } ], "source": [ "print(re.findall('....e$', s))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "print(re.findall('....E$', s))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Wikipedia\\nDeveloper: Python Software Foundation\\nStable release: 3.9.5 / 3 May 2021; 29 days ago\\nPreview release: 3.10.0b1 / 3 May 2021; 29 days ago\\nTyping discipline: Duck, dynamic, strong typing; gradual (since 3.5, but ignored in CPython)\\nFirst appeared: February 1991; 30 years ago\\nParadigm: Multi-paradigm: object-oriented, procedural (imperative), functional, structured, reflective\"" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['vkbZ']\n" ] } ], "source": [ "st = 'kjbdvkjsdbvkjbvkbZ'\n", "\n", "\n", "print(re.findall('....\\Z', st))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pyt', 'hon', ' is', ' an', ' in', 'ter', 'pre', 'ted', ' hi', 'gh-', 'lev', 'el ', 'gen', 'era', 'l-p', 'urp', 'ose', ' pr', 'ogr', 'amm', 'ing', ' la', 'ngu', 'age', '. P', 'yth', \"on'\", 's d', 'esi', 'gn ', 'phi', 'los', 'oph', 'y e', 'mph', 'asi', 'zes', ' co', 'de ', 'rea', 'dab', 'ili', 'ty ', 'wit', 'h i', 'ts ', 'not', 'abl', 'e u', 'se ', 'of ', 'sig', 'nif', 'ica', 'nt ', 'ind', 'ent', 'ati', 'on.', ' Wi', 'kip', 'edi', 'Dev', 'elo', 'per', ': P', 'yth', 'on ', 'Sof', 'twa', 're ', 'Fou', 'nda', 'tio', 'Sta', 'ble', ' re', 'lea', 'se:', ' 3.', '9.5', ' / ', '3 M', 'ay ', '202', '1; ', '29 ', 'day', 's a', 'Pre', 'vie', 'w r', 'ele', 'ase', ': 3', '.10', '.0b', '1 /', ' 3 ', 'May', ' 20', '21;', ' 29', ' da', 'ys ', 'ago', 'Typ', 'ing', ' di', 'sci', 'pli', 'ne:', ' Du', 'ck,', ' dy', 'nam', 'ic,', ' st', 'ron', 'g t', 'ypi', 'ng;', ' gr', 'adu', 'al ', '(si', 'nce', ' 3.', '5, ', 'but', ' ig', 'nor', 'ed ', 'in ', 'CPy', 'tho', 'Fir', 'st ', 'app', 'ear', 'ed:', ' Fe', 'bru', 'ary', ' 19', '91;', ' 30', ' ye', 'ars', ' ag', 'Par', 'adi', 'gm:', ' Mu', 'lti', '-pa', 'rad', 'igm', ': o', 'bje', 'ct-', 'ori', 'ent', 'ed,', ' pr', 'oce', 'dur', 'al ', '(im', 'per', 'ati', 've)', ', f', 'unc', 'tio', 'nal', ', s', 'tru', 'ctu', 'red', ', r', 'efl', 'ect', 'ive']\n" ] } ], "source": [ "print(re.findall('...', s))" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pyt', 'hon', ' is', ' an', ' in', 'ter', 'pre', 'ted', ' hi', 'h-l', 'eve', 'l g', 'ene', 'ral', '-pu', 'rpo', 'e p', 'rog', 'ram', 'min', 'g l', 'ang', 'uag', ' Py', 'tho', \"n's\", ' de', 'sig', 'n p', 'hil', 'oso', 'phy', ' em', 'pha', 'siz', 's c', 'ode', ' re', 'ada', 'bil', 'ity', ' wi', 'h i', 's n', 'ota', 'ble', ' us', 'e o', 'f s', 'ign', 'ifi', 'can', 't i', 'nde', 'nta', 'tio', ' Wi', 'kip', 'edi', 'Dev', 'elo', 'per', ' Py', 'tho', ' So', 'ftw', 'are', ' Fo', 'und', 'ati', 'Sta', 'ble', ' re', 'lea', ' Ma', '9 d', 'ays', ' ag', 'Pre', 'vie', 'w r', 'ele', 'ase', '.0b', ' Ma', '9 d', 'ays', ' ag', 'Typ', 'ing', ' di', 'sci', 'pli', ' Du', ', d', 'yna', 'mic', ', s', 'tro', 'g t', 'ypi', '; g', 'rad', 'ual', ' (s', 'inc', ', b', 't i', 'gno', 'red', ' in', 'CPy', 'tho', 'Fir', 't a', 'ppe', 'are', ' Fe', 'bru', 'ary', '0 y', 'ear', 's a', 'Par', 'adi', ' Mu', 'lti', '-pa', 'rad', 'igm', ': o', 'bje', 't-o', 'rie', 'nte', ', p', 'roc', 'edu', 'ral', ' (i', 'mpe', 'rat', 'ive', ', f', 'unc', 'tio', 'nal', ', s', 'tru', 'ctu', 'red', ', r', 'efl', 'ect', 'ive']\n" ] } ], "source": [ "print(re.findall('..[a-z]', s))" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['yth', 'n i', 's a', 'n i', 'nte', 'rpr', 'ete', 'd h', 'igh', 'lev', 'l g', 'ene', 'ral', 'pur', 'pos', 'e p', 'rog', 'ram', 'min', 'g l', 'ang', 'uag', 'yth', \"n's\", 'des', 'ign', 'phi', 'los', 'oph', 'y e', 'mph', 'asi', 'zes', 'cod', 'e r', 'ead', 'abi', 'lit', 'y w', 'ith', 'its', 'not', 'abl', 'e u', 'e o', 'f s', 'ign', 'ifi', 'can', 't i', 'nde', 'nta', 'tio', 'iki', 'ped', 'eve', 'lop', 'yth', 'oft', 'war', 'oun', 'dat', 'ion', 'tab', 'e r', 'ele', 'ase', 'day', 's a', 'rev', 'iew', 'rel', 'eas', 'day', 's a', 'ypi', 'g d', 'isc', 'ipl', 'ine', 'uck', 'dyn', 'ami', 'str', 'ong', 'typ', 'ing', 'gra', 'dua', 'sin', 'but', 'ign', 'ore', 'd i', 'yth', 'irs', 't a', 'ppe', 'are', 'ebr', 'uar', 'yea', 's a', 'ara', 'dig', 'ult', 'i-p', 'ara', 'dig', 'obj', 'ect', 'ori', 'ent', 'pro', 'ced', 'ura', 'imp', 'era', 'tiv', 'fun', 'cti', 'ona', 'str', 'uct', 'ure', 'ref', 'lec', 'tiv']\n" ] } ], "source": [ "print(re.findall('[a-z].[a-z]', s))" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pyt', 'hon', 's a', 'n i', 'nte', 'rpr', 'ete', 'd h', 'igh', 'lev', 'l g', 'ene', 'ral', 'pur', 'pos', 'e p', 'rog', 'ram', 'min', 'g l', 'ang', 'uag', 'Pyt', 'hon', 's d', 'esi', 'n p', 'hil', 'oso', 'phy', 'emp', 'has', 'ize', 's c', 'ode', 'rea', 'dab', 'ili', 'y w', 'ith', 'its', 'not', 'abl', 'e u', 'e o', 'f s', 'ign', 'ifi', 'can', 't i', 'nde', 'nta', 'tio', 'Wik', 'ipe', 'dia', 'Dev', 'elo', 'per', 'Pyt', 'hon', 'Sof', 'twa', 'Fou', 'nda', 'tio', 'Sta', 'ble', 'rel', 'eas', 'May', 'day', 's a', 'Pre', 'vie', 'w r', 'ele', 'ase', 'May', 'day', 's a', 'Typ', 'ing', 'dis', 'cip', 'lin', 'Duc', 'dyn', 'ami', 'str', 'ong', 'typ', 'ing', 'gra', 'dua', 'sin', 'but', 'ign', 'ore', 'd i', 'CPy', 'tho', 'Fir', 't a', 'ppe', 'are', 'Feb', 'rua', 'yea', 's a', 'Par', 'adi', 'Mul', 'i-p', 'ara', 'dig', 'obj', 'ect', 'ori', 'ent', 'pro', 'ced', 'ura', 'imp', 'era', 'tiv', 'fun', 'cti', 'ona', 'str', 'uct', 'ure', 'ref', 'lec', 'tiv']\n" ] } ], "source": [ "print(re.findall('[A-Za-z].[a-z]', s))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pyt', 'hon', 'int', 'erp', 'ret', 'hig', 'lev', 'gen', 'era', 'pur', 'pos', 'pro', 'gra', 'mmi', 'lan', 'gua', 'Pyt', 'hon', 'des', 'ign', 'phi', 'los', 'oph', 'emp', 'has', 'ize', 'cod', 'rea', 'dab', 'ili', 'wit', 'its', 'not', 'abl', 'use', 'sig', 'nif', 'ica', 'ind', 'ent', 'ati', 'Wik', 'ipe', 'dia', 'Dev', 'elo', 'per', 'Pyt', 'hon', 'Sof', 'twa', 'Fou', 'nda', 'tio', 'Sta', 'ble', 'rel', 'eas', 'May', 'day', 'ago', 'Pre', 'vie', 'rel', 'eas', 'May', 'day', 'ago', 'Typ', 'ing', 'dis', 'cip', 'lin', 'Duc', 'dyn', 'ami', 'str', 'ong', 'typ', 'ing', 'gra', 'dua', 'sin', 'but', 'ign', 'ore', 'Pyt', 'hon', 'Fir', 'app', 'ear', 'Feb', 'rua', 'yea', 'ago', 'Par', 'adi', 'Mul', 'par', 'adi', 'obj', 'ect', 'ori', 'ent', 'pro', 'ced', 'ura', 'imp', 'era', 'tiv', 'fun', 'cti', 'ona', 'str', 'uct', 'ure', 'ref', 'lec', 'tiv']\n" ] } ], "source": [ "print(re.findall('[A-Za-z][a-z][a-z]', s))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Pyt', 'Pyt', 'Wik', 'Dev', 'Pyt', 'Sof', 'Fou', 'Sta', 'May', 'Pre', 'May', 'Typ', 'Duc', 'Pyt', 'Fir', 'Feb', 'Par', 'Mul']\n" ] } ], "source": [ "print(re.findall('[A-Z][a-z][a-z]', s))" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "d = 'Today I recieved 10$ from my boss'" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['10$']\n" ] } ], "source": [ "print(re.findall('..\\$', d))" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['ss']\n" ] } ], "source": [ "print(re.findall('..$', d))" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "print(re.findall('..e$', d))" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'a', 'n', ' ', 'i', 'n', 't', 'e', 'r', 'p', 'r', 'e', 't', 'e', 'd', ' ', 'h', 'i', 'g', 'h', '-', 'l', 'e', 'v', 'e', 'l', ' ', 'g', 'e', 'n', 'e', 'r', 'a', 'l', '-', 'p', 'u', 'r', 'p', 'o', 's', 'e', ' ', 'p', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', ' ', 'l', 'a', 'n', 'g', 'u', 'a', 'g', 'e', '.', ' ', 'P', 'y', 't', 'h', 'o', 'n', \"'\", 's', ' ', 'd', 'e', 's', 'i', 'g', 'n', ' ', 'p', 'h', 'i', 'l', 'o', 's', 'o', 'p', 'h', 'y', ' ', 'e', 'm', 'p', 'h', 'a', 's', 'i', 'z', 'e', 's', ' ', 'c', 'o', 'd', 'e', ' ', 'r', 'e', 'a', 'd', 'a', 'b', 'i', 'l', 'i', 't', 'y', ' ', 'w', 'i', 't', 'h', ' ', 'i', 't', 's', ' ', 'n', 'o', 't', 'a', 'b', 'l', 'e', ' ', 'u', 's', 'e', ' ', 'o', 'f', ' ', 's', 'i', 'g', 'n', 'i', 'f', 'i', 'c', 'a', 'n', 't', ' ', 'i', 'n', 'd', 'e', 'n', 't', 'a', 't', 'i', 'o', 'n', '.', ' ', 'W', 'i', 'k', 'i', 'p', 'e', 'd', 'i', 'a', 'D', 'e', 'v', 'e', 'l', 'o', 'p', 'e', 'r', ':', ' ', 'P', 'y', 't', 'h', 'o', 'n', ' ', 'S', 'o', 'f', 't', 'w', 'a', 'r', 'e', ' ', 'F', 'o', 'u', 'n', 'd', 'a', 't', 'i', 'o', 'n', 'S', 't', 'a', 'b', 'l', 'e', ' ', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', ' ', '3', '.', '9', '.', '5', ' ', '/', ' ', '3', ' ', 'M', 'a', 'y', ' ', '2', '0', '2', '1', ';', ' ', '2', '9', ' ', 'd', 'a', 'y', 's', ' ', 'a', 'g', 'o', 'P', 'r', 'e', 'v', 'i', 'e', 'w', ' ', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', ' ', '3', '.', '1', '0', '.', '0', 'b', '1', ' ', '/', ' ', '3', ' ', 'M', 'a', 'y', ' ', '2', '0', '2', '1', ';', ' ', '2', '9', ' ', 'd', 'a', 'y', 's', ' ', 'a', 'g', 'o', 'T', 'y', 'p', 'i', 'n', 'g', ' ', 'd', 'i', 's', 'c', 'i', 'p', 'l', 'i', 'n', 'e', ':', ' ', 'D', 'u', 'c', 'k', ',', ' ', 'd', 'y', 'n', 'a', 'm', 'i', 'c', ',', ' ', 's', 't', 'r', 'o', 'n', 'g', ' ', 't', 'y', 'p', 'i', 'n', 'g', ';', ' ', 'g', 'r', 'a', 'd', 'u', 'a', 'l', ' ', '(', 's', 'i', 'n', 'c', 'e', ' ', '3', '.', '5', ',', ' ', 'b', 'u', 't', ' ', 'i', 'g', 'n', 'o', 'r', 'e', 'd', ' ', 'i', 'n', ' ', 'C', 'P', 'y', 't', 'h', 'o', 'n', ')', 'F', 'i', 'r', 's', 't', ' ', 'a', 'p', 'p', 'e', 'a', 'r', 'e', 'd', ':', ' ', 'F', 'e', 'b', 'r', 'u', 'a', 'r', 'y', ' ', '1', '9', '9', '1', ';', ' ', '3', '0', ' ', 'y', 'e', 'a', 'r', 's', ' ', 'a', 'g', 'o', 'P', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', ' ', 'M', 'u', 'l', 't', 'i', '-', 'p', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', ' ', 'o', 'b', 'j', 'e', 'c', 't', '-', 'o', 'r', 'i', 'e', 'n', 't', 'e', 'd', ',', ' ', 'p', 'r', 'o', 'c', 'e', 'd', 'u', 'r', 'a', 'l', ' ', '(', 'i', 'm', 'p', 'e', 'r', 'a', 't', 'i', 'v', 'e', ')', ',', ' ', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', 'a', 'l', ',', ' ', 's', 't', 'r', 'u', 'c', 't', 'u', 'r', 'e', 'd', ',', ' ', 'r', 'e', 'f', 'l', 'e', 'c', 't', 'i', 'v', 'e']\n" ] } ], "source": [ "print(re.findall('.', s))" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['.', '.', '.', '.', '.', '.', '.']\n" ] } ], "source": [ "print(re.findall('\\.', s))" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'y', 't', 'h', 'o', 'n', 'i', 's', 'a', 'n', 'i', 'n', 't', 'e', 'r', 'p', 'r', 'e', 't', 'e', 'd', 'h', 'i', 'g', 'h', 'l', 'e', 'v', 'e', 'l', 'g', 'e', 'n', 'e', 'r', 'a', 'l', 'p', 'u', 'r', 'p', 'o', 's', 'e', 'p', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', 'l', 'a', 'n', 'g', 'u', 'a', 'g', 'e', 'P', 'y', 't', 'h', 'o', 'n', 's', 'd', 'e', 's', 'i', 'g', 'n', 'p', 'h', 'i', 'l', 'o', 's', 'o', 'p', 'h', 'y', 'e', 'm', 'p', 'h', 'a', 's', 'i', 'z', 'e', 's', 'c', 'o', 'd', 'e', 'r', 'e', 'a', 'd', 'a', 'b', 'i', 'l', 'i', 't', 'y', 'w', 'i', 't', 'h', 'i', 't', 's', 'n', 'o', 't', 'a', 'b', 'l', 'e', 'u', 's', 'e', 'o', 'f', 's', 'i', 'g', 'n', 'i', 'f', 'i', 'c', 'a', 'n', 't', 'i', 'n', 'd', 'e', 'n', 't', 'a', 't', 'i', 'o', 'n', 'W', 'i', 'k', 'i', 'p', 'e', 'd', 'i', 'a', 'D', 'e', 'v', 'e', 'l', 'o', 'p', 'e', 'r', 'P', 'y', 't', 'h', 'o', 'n', 'S', 'o', 'f', 't', 'w', 'a', 'r', 'e', 'F', 'o', 'u', 'n', 'd', 'a', 't', 'i', 'o', 'n', 'S', 't', 'a', 'b', 'l', 'e', 'r', 'e', 'l', 'e', 'a', 's', 'e', '3', '9', '5', '3', 'M', 'a', 'y', '2', '0', '2', '1', '2', '9', 'd', 'a', 'y', 's', 'a', 'g', 'o', 'P', 'r', 'e', 'v', 'i', 'e', 'w', 'r', 'e', 'l', 'e', 'a', 's', 'e', '3', '1', '0', '0', 'b', '1', '3', 'M', 'a', 'y', '2', '0', '2', '1', '2', '9', 'd', 'a', 'y', 's', 'a', 'g', 'o', 'T', 'y', 'p', 'i', 'n', 'g', 'd', 'i', 's', 'c', 'i', 'p', 'l', 'i', 'n', 'e', 'D', 'u', 'c', 'k', 'd', 'y', 'n', 'a', 'm', 'i', 'c', 's', 't', 'r', 'o', 'n', 'g', 't', 'y', 'p', 'i', 'n', 'g', 'g', 'r', 'a', 'd', 'u', 'a', 'l', 's', 'i', 'n', 'c', 'e', '3', '5', 'b', 'u', 't', 'i', 'g', 'n', 'o', 'r', 'e', 'd', 'i', 'n', 'C', 'P', 'y', 't', 'h', 'o', 'n', 'F', 'i', 'r', 's', 't', 'a', 'p', 'p', 'e', 'a', 'r', 'e', 'd', 'F', 'e', 'b', 'r', 'u', 'a', 'r', 'y', '1', '9', '9', '1', '3', '0', 'y', 'e', 'a', 'r', 's', 'a', 'g', 'o', 'P', 'a', 'r', 'a', 'd', 'i', 'g', 'm', 'M', 'u', 'l', 't', 'i', 'p', 'a', 'r', 'a', 'd', 'i', 'g', 'm', 'o', 'b', 'j', 'e', 'c', 't', 'o', 'r', 'i', 'e', 'n', 't', 'e', 'd', 'p', 'r', 'o', 'c', 'e', 'd', 'u', 'r', 'a', 'l', 'i', 'm', 'p', 'e', 'r', 'a', 't', 'i', 'v', 'e', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', 'a', 'l', 's', 't', 'r', 'u', 'c', 't', 'u', 'r', 'e', 'd', 'r', 'e', 'f', 'l', 'e', 'c', 't', 'i', 'v', 'e']\n" ] } ], "source": [ "print(re.findall('\\w', s))" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[' ', ' ', ' ', ' ', '-', ' ', '-', ' ', ' ', '.', ' ', \"'\", ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '.', ' ', '\\n', ':', ' ', ' ', ' ', '\\n', ' ', ':', ' ', '.', '.', ' ', '/', ' ', ' ', ' ', ';', ' ', ' ', ' ', '\\n', ' ', ':', ' ', '.', '.', ' ', '/', ' ', ' ', ' ', ';', ' ', ' ', ' ', '\\n', ' ', ':', ' ', ',', ' ', ',', ' ', ' ', ';', ' ', ' ', '(', ' ', '.', ',', ' ', ' ', ' ', ' ', ')', '\\n', ' ', ':', ' ', ' ', ';', ' ', ' ', ' ', '\\n', ':', ' ', '-', ':', ' ', '-', ',', ' ', ' ', '(', ')', ',', ' ', ',', ' ', ',', ' ']\n" ] } ], "source": [ "print(re.findall('\\W', s))" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['3', '9', '5', '3', '2', '0', '2', '1', '2', '9', '3', '1', '0', '0', '1', '3', '2', '0', '2', '1', '2', '9', '3', '5', '1', '9', '9', '1', '3', '0']\n" ] } ], "source": [ "print(re.findall('\\d', s))" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'y', 't', 'h', 'o', 'n', ' ', 'i', 's', ' ', 'a', 'n', ' ', 'i', 'n', 't', 'e', 'r', 'p', 'r', 'e', 't', 'e', 'd', ' ', 'h', 'i', 'g', 'h', '-', 'l', 'e', 'v', 'e', 'l', ' ', 'g', 'e', 'n', 'e', 'r', 'a', 'l', '-', 'p', 'u', 'r', 'p', 'o', 's', 'e', ' ', 'p', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', ' ', 'l', 'a', 'n', 'g', 'u', 'a', 'g', 'e', '.', ' ', 'P', 'y', 't', 'h', 'o', 'n', \"'\", 's', ' ', 'd', 'e', 's', 'i', 'g', 'n', ' ', 'p', 'h', 'i', 'l', 'o', 's', 'o', 'p', 'h', 'y', ' ', 'e', 'm', 'p', 'h', 'a', 's', 'i', 'z', 'e', 's', ' ', 'c', 'o', 'd', 'e', ' ', 'r', 'e', 'a', 'd', 'a', 'b', 'i', 'l', 'i', 't', 'y', ' ', 'w', 'i', 't', 'h', ' ', 'i', 't', 's', ' ', 'n', 'o', 't', 'a', 'b', 'l', 'e', ' ', 'u', 's', 'e', ' ', 'o', 'f', ' ', 's', 'i', 'g', 'n', 'i', 'f', 'i', 'c', 'a', 'n', 't', ' ', 'i', 'n', 'd', 'e', 'n', 't', 'a', 't', 'i', 'o', 'n', '.', ' ', 'W', 'i', 'k', 'i', 'p', 'e', 'd', 'i', 'a', '\\n', 'D', 'e', 'v', 'e', 'l', 'o', 'p', 'e', 'r', ':', ' ', 'P', 'y', 't', 'h', 'o', 'n', ' ', 'S', 'o', 'f', 't', 'w', 'a', 'r', 'e', ' ', 'F', 'o', 'u', 'n', 'd', 'a', 't', 'i', 'o', 'n', '\\n', 'S', 't', 'a', 'b', 'l', 'e', ' ', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', ' ', '.', '.', ' ', '/', ' ', ' ', 'M', 'a', 'y', ' ', ';', ' ', ' ', 'd', 'a', 'y', 's', ' ', 'a', 'g', 'o', '\\n', 'P', 'r', 'e', 'v', 'i', 'e', 'w', ' ', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', ' ', '.', '.', 'b', ' ', '/', ' ', ' ', 'M', 'a', 'y', ' ', ';', ' ', ' ', 'd', 'a', 'y', 's', ' ', 'a', 'g', 'o', '\\n', 'T', 'y', 'p', 'i', 'n', 'g', ' ', 'd', 'i', 's', 'c', 'i', 'p', 'l', 'i', 'n', 'e', ':', ' ', 'D', 'u', 'c', 'k', ',', ' ', 'd', 'y', 'n', 'a', 'm', 'i', 'c', ',', ' ', 's', 't', 'r', 'o', 'n', 'g', ' ', 't', 'y', 'p', 'i', 'n', 'g', ';', ' ', 'g', 'r', 'a', 'd', 'u', 'a', 'l', ' ', '(', 's', 'i', 'n', 'c', 'e', ' ', '.', ',', ' ', 'b', 'u', 't', ' ', 'i', 'g', 'n', 'o', 'r', 'e', 'd', ' ', 'i', 'n', ' ', 'C', 'P', 'y', 't', 'h', 'o', 'n', ')', '\\n', 'F', 'i', 'r', 's', 't', ' ', 'a', 'p', 'p', 'e', 'a', 'r', 'e', 'd', ':', ' ', 'F', 'e', 'b', 'r', 'u', 'a', 'r', 'y', ' ', ';', ' ', ' ', 'y', 'e', 'a', 'r', 's', ' ', 'a', 'g', 'o', '\\n', 'P', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', ' ', 'M', 'u', 'l', 't', 'i', '-', 'p', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', ' ', 'o', 'b', 'j', 'e', 'c', 't', '-', 'o', 'r', 'i', 'e', 'n', 't', 'e', 'd', ',', ' ', 'p', 'r', 'o', 'c', 'e', 'd', 'u', 'r', 'a', 'l', ' ', '(', 'i', 'm', 'p', 'e', 'r', 'a', 't', 'i', 'v', 'e', ')', ',', ' ', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', 'a', 'l', ',', ' ', 's', 't', 'r', 'u', 'c', 't', 'u', 'r', 'e', 'd', ',', ' ', 'r', 'e', 'f', 'l', 'e', 'c', 't', 'i', 'v', 'e']\n" ] } ], "source": [ "print(re.findall('\\D', s))" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['3.', '9.', '5 ', '3 ', '20', '21', '29', '3.', '10', '0b', '1 ', '3 ', '20', '21', '29', '3.', '5,', '19', '91', '30']\n" ] } ], "source": [ "print(re.findall('\\d.', s))" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['20', '21', '29', '10', '20', '21', '29', '19', '91', '30']\n" ] } ], "source": [ "print(re.findall('\\d\\d', s))" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['202', '202', '199']\n" ] } ], "source": [ "print(re.findall('\\d\\d\\d', s))" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['2021', '2021', '1991']\n" ] } ], "source": [ "print(re.findall('\\d\\d\\d\\d', s))" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "print(re.findall('\\d\\d\\d\\d\\d', s))" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "n = '4556'\n", "n2 = '365'\n", "\n", "\n", "print(re.findall('^\\d\\d\\d$', n))" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['365']\n" ] } ], "source": [ "print(re.findall('^\\d\\d\\d$', n2))" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "print(re.findall('^\\d\\d\\d$', '3o5'))" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\\n', ' ', ' ', ' ', '\\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '\\n', ' ', ' ', ' ', ' ', ' ', ' ', '\\n', ' ', ' ', ' ', ' ', ' ', ' ', ' ']\n" ] } ], "source": [ "print(re.findall('\\s', s))" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'y', 't', 'h', 'o', 'n', 'i', 's', 'a', 'n', 'i', 'n', 't', 'e', 'r', 'p', 'r', 'e', 't', 'e', 'd', 'h', 'i', 'g', 'h', '-', 'l', 'e', 'v', 'e', 'l', 'g', 'e', 'n', 'e', 'r', 'a', 'l', '-', 'p', 'u', 'r', 'p', 'o', 's', 'e', 'p', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g', 'l', 'a', 'n', 'g', 'u', 'a', 'g', 'e', '.', 'P', 'y', 't', 'h', 'o', 'n', \"'\", 's', 'd', 'e', 's', 'i', 'g', 'n', 'p', 'h', 'i', 'l', 'o', 's', 'o', 'p', 'h', 'y', 'e', 'm', 'p', 'h', 'a', 's', 'i', 'z', 'e', 's', 'c', 'o', 'd', 'e', 'r', 'e', 'a', 'd', 'a', 'b', 'i', 'l', 'i', 't', 'y', 'w', 'i', 't', 'h', 'i', 't', 's', 'n', 'o', 't', 'a', 'b', 'l', 'e', 'u', 's', 'e', 'o', 'f', 's', 'i', 'g', 'n', 'i', 'f', 'i', 'c', 'a', 'n', 't', 'i', 'n', 'd', 'e', 'n', 't', 'a', 't', 'i', 'o', 'n', '.', 'W', 'i', 'k', 'i', 'p', 'e', 'd', 'i', 'a', 'D', 'e', 'v', 'e', 'l', 'o', 'p', 'e', 'r', ':', 'P', 'y', 't', 'h', 'o', 'n', 'S', 'o', 'f', 't', 'w', 'a', 'r', 'e', 'F', 'o', 'u', 'n', 'd', 'a', 't', 'i', 'o', 'n', 'S', 't', 'a', 'b', 'l', 'e', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', '3', '.', '9', '.', '5', '/', '3', 'M', 'a', 'y', '2', '0', '2', '1', ';', '2', '9', 'd', 'a', 'y', 's', 'a', 'g', 'o', 'P', 'r', 'e', 'v', 'i', 'e', 'w', 'r', 'e', 'l', 'e', 'a', 's', 'e', ':', '3', '.', '1', '0', '.', '0', 'b', '1', '/', '3', 'M', 'a', 'y', '2', '0', '2', '1', ';', '2', '9', 'd', 'a', 'y', 's', 'a', 'g', 'o', 'T', 'y', 'p', 'i', 'n', 'g', 'd', 'i', 's', 'c', 'i', 'p', 'l', 'i', 'n', 'e', ':', 'D', 'u', 'c', 'k', ',', 'd', 'y', 'n', 'a', 'm', 'i', 'c', ',', 's', 't', 'r', 'o', 'n', 'g', 't', 'y', 'p', 'i', 'n', 'g', ';', 'g', 'r', 'a', 'd', 'u', 'a', 'l', '(', 's', 'i', 'n', 'c', 'e', '3', '.', '5', ',', 'b', 'u', 't', 'i', 'g', 'n', 'o', 'r', 'e', 'd', 'i', 'n', 'C', 'P', 'y', 't', 'h', 'o', 'n', ')', 'F', 'i', 'r', 's', 't', 'a', 'p', 'p', 'e', 'a', 'r', 'e', 'd', ':', 'F', 'e', 'b', 'r', 'u', 'a', 'r', 'y', '1', '9', '9', '1', ';', '3', '0', 'y', 'e', 'a', 'r', 's', 'a', 'g', 'o', 'P', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', 'M', 'u', 'l', 't', 'i', '-', 'p', 'a', 'r', 'a', 'd', 'i', 'g', 'm', ':', 'o', 'b', 'j', 'e', 'c', 't', '-', 'o', 'r', 'i', 'e', 'n', 't', 'e', 'd', ',', 'p', 'r', 'o', 'c', 'e', 'd', 'u', 'r', 'a', 'l', '(', 'i', 'm', 'p', 'e', 'r', 'a', 't', 'i', 'v', 'e', ')', ',', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', 'a', 'l', ',', 's', 't', 'r', 'u', 'c', 't', 'u', 'r', 'e', 'd', ',', 'r', 'e', 'f', 'l', 'e', 'c', 't', 'i', 'v', 'e']\n" ] } ], "source": [ "print(re.findall('\\S', s))" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Today I recieved 10$ from my boss'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "print(re.findall('\\b.', 'hello'))" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'P', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'P', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'P', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'P', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'P', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']\n" ] } ], "source": [ "print(re.findall('P*', s))" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['P', 'P', 'P', 'P', 'P', 'P']\n" ] } ], "source": [ "print(re.findall('P\\d*', s))" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "email = 'anilkumar_t@apssdc.in'" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['anilkumar', '', 't', '', 'apssdc', '', 'in', '']\n" ] } ], "source": [ "print(re.findall('[a-z]*', email))" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['anilkumar_t', '', 'apssdc', '', 'in', '']\n" ] } ], "source": [ "print(re.findall('[a-z_]*', email))" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['anilkumar', 't', 'apssdc', 'in']\n" ] } ], "source": [ "print(re.findall('[a-z]+', email))" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'n', 'i', 'l', 'k', 'u', 'm', 'a', 'r', '', 't', '', 'a', 'p', 's', 's', 'd', 'c', '', 'i', 'n', '']\n" ] } ], "source": [ "print(re.findall('[a-z]?', email))" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['an', 'il', 'ku', 'ma', 'r_', 't@', 'ap', 'ss', 'dc', '.', 'in']\n" ] } ], "source": [ "print(re.findall('[a-z]?.', email))" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['an', 'il', 'ku', 'ma', 'r', 't', 'ap', 'ss', 'dc', 'in']\n" ] } ], "source": [ "print(re.findall('[a-z]?[a-z]', email))" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['em', 'ma', 'cs']\n" ] } ], "source": [ "print(re.findall('[a-z]{2}', 'emmacs'))" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['emm', 'acs']\n" ] } ], "source": [ "print(re.findall('[a-z]{3}', 'emmacs'))" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [], "source": [ "ip = '163.125.255.125'\n", "\n", "ip2 = '172.160.250.5'" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "re.match('163.125.255.[0-2][0-5][0-5]', ip)" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "print(re.match('163.125.255.[0-2][0-5][0-5]', ip2))" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0, 15)" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj = re.match('163.125.255.[0-2][0-5][0-5]', ip)\n", "\n", "\n", "obj.span()" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ip3 = '163.125.255.125opa'\n", "\n", "re.match('163.125.255.[0-2][0-5][0-5]', ip3)" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "ip3 = '163.125.255.125opa'\n", "\n", "print(re.match('^163.125.255.[0-2][0-5][0-5]$', ip3))" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "re.match('^163.125.255.[0-2][0-5][0-5]$', ip)" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'163.125.255.125'" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj.string" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- +91\n", "- 6789\n", "- 10" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [], "source": [ "m1 = '9876543210'\n", "m2 = '3216549870'\n", "m3 = '987654321o'" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pattern = '^[6789]\\d{9}$'\n", "\n", "\n", "re.match(pattern, m1)" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None None\n" ] } ], "source": [ "print(re.match(pattern, m2), re.match(pattern, m3))" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [], "source": [ "m4 = '+919876543210'\n", "\n", "\n", "pattern = '+91^[6789]\\d{9}'" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "ename": "error", "evalue": "nothing to repeat at position 0", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31merror\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mre\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmatch\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mpattern\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mm4\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m~\\anaconda3\\lib\\re.py\u001b[0m in \u001b[0;36mmatch\u001b[1;34m(pattern, string, flags)\u001b[0m\n\u001b[0;32m 189\u001b[0m \"\"\"Try to apply the pattern at the start of the string, returning\n\u001b[0;32m 190\u001b[0m a Match object, or None if no match was found.\"\"\"\n\u001b[1;32m--> 191\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0m_compile\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mpattern\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mflags\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mmatch\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mstring\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 192\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 193\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mfullmatch\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mpattern\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mstring\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mflags\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\anaconda3\\lib\\re.py\u001b[0m in \u001b[0;36m_compile\u001b[1;34m(pattern, flags)\u001b[0m\n\u001b[0;32m 302\u001b[0m \u001b[1;32mif\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[0msre_compile\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0misstring\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mpattern\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 303\u001b[0m \u001b[1;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"first argument must be string or compiled pattern\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 304\u001b[1;33m \u001b[0mp\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msre_compile\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcompile\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mpattern\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mflags\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 305\u001b[0m \u001b[1;32mif\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0mflags\u001b[0m \u001b[1;33m&\u001b[0m \u001b[0mDEBUG\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 306\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mlen\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0m_cache\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m>=\u001b[0m \u001b[0m_MAXCACHE\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\anaconda3\\lib\\sre_compile.py\u001b[0m in \u001b[0;36mcompile\u001b[1;34m(p, flags)\u001b[0m\n\u001b[0;32m 762\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0misstring\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mp\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 763\u001b[0m \u001b[0mpattern\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mp\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 764\u001b[1;33m \u001b[0mp\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msre_parse\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mparse\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mp\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mflags\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 765\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 766\u001b[0m \u001b[0mpattern\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\anaconda3\\lib\\sre_parse.py\u001b[0m in \u001b[0;36mparse\u001b[1;34m(str, flags, state)\u001b[0m\n\u001b[0;32m 946\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 947\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 948\u001b[1;33m \u001b[0mp\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_parse_sub\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0msource\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mstate\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mflags\u001b[0m \u001b[1;33m&\u001b[0m \u001b[0mSRE_FLAG_VERBOSE\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 949\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mVerbose\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 950\u001b[0m \u001b[1;31m# the VERBOSE flag was switched on inside the pattern. to be\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\anaconda3\\lib\\sre_parse.py\u001b[0m in \u001b[0;36m_parse_sub\u001b[1;34m(source, state, verbose, nested)\u001b[0m\n\u001b[0;32m 441\u001b[0m \u001b[0mstart\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0msource\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtell\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 442\u001b[0m \u001b[1;32mwhile\u001b[0m \u001b[1;32mTrue\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 443\u001b[1;33m itemsappend(_parse(source, state, verbose, nested + 1,\n\u001b[0m\u001b[0;32m 444\u001b[0m not nested and not items))\n\u001b[0;32m 445\u001b[0m \u001b[1;32mif\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[0msourcematch\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"|\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\anaconda3\\lib\\sre_parse.py\u001b[0m in \u001b[0;36m_parse\u001b[1;34m(source, state, verbose, nested, first)\u001b[0m\n\u001b[0;32m 666\u001b[0m \u001b[0mitem\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mNone\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 667\u001b[0m \u001b[1;32mif\u001b[0m \u001b[1;32mnot\u001b[0m \u001b[0mitem\u001b[0m \u001b[1;32mor\u001b[0m \u001b[0mitem\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32mis\u001b[0m \u001b[0mAT\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 668\u001b[1;33m raise source.error(\"nothing to repeat\",\n\u001b[0m\u001b[0;32m 669\u001b[0m source.tell() - here + len(this))\n\u001b[0;32m 670\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mitem\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32min\u001b[0m \u001b[0m_REPEATCODES\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31merror\u001b[0m: nothing to repeat at position 0" ] } ], "source": [ "re.match(pattern, m4)" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pattern = '^[+]91[6789]\\d{9}'\n", "\n", "\n", "re.match(pattern, m4)" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "m3 = '+9164578d9656'\n", "\n", "print(re.match(pattern, m3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### task\n", "\n", "\n", "### userName@domain.extension\n", "\n", "\n", "- userName\n", " - a-zA-Z0-9._\n", " - it shouldn't startwith .\n", " - min 8 max 15\n", "- @\n", "- Domain\n", " - a-zA-Z0-9\n", " - min 2 max 10\n", "- Extension\n", " - a-zA-Z\n", " - 2, 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### sub -> replace\n", "\n", "### syntax\n", "\n", "sub(Pat, repla, string)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python-is-an-interpreted-high-level-general-purpose-programming-language.-Python's-design-philosophy-emphasizes-code-readability-with-its-notable-use-of-significant-indentation.-Wikipedia-Developer:-Python-Software-Foundation-Stable-release:-3.9.5-/-3-May-2021;-29-days-ago-Preview-release:-3.10.0b1-/-3-May-2021;-29-days-ago-Typing-discipline:-Duck,-dynamic,-strong-typing;-gradual-(since-3.5,-but-ignored-in-CPython)-First-appeared:-February-1991;-30-years-ago-Paradigm:-Multi-paradigm:-object-oriented,-procedural-(imperative),-functional,-structured,-reflective\n" ] } ], "source": [ "print(re.sub('\\s', '-', s))" ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Wikipedia\n", "Developer: Python Software Foundation\n", "Stable release: 0.0.0 / 0 May 0000; 00 days ago\n", "Preview release: 0.00.0b0 / 0 May 0000; 00 days ago\n", "Typing discipline: Duck, dynamic, strong typing; gradual (since 0.0, but ignored in CPython)\n", "First appeared: February 0000; 00 years ago\n", "Paradigm: Multi-paradigm: object-oriented, procedural (imperative), functional, structured, reflective\n" ] } ], "source": [ "print(re.sub('\\d', '0', s))" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(\"Python is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Wikipedia\\nDeveloper: Python Software Foundation\\nStable release: 0.0.0 / 0 May 0000; 00 days ago\\nPreview release: 0.00.0b0 / 0 May 0000; 00 days ago\\nTyping discipline: Duck, dynamic, strong typing; gradual (since 0.0, but ignored in CPython)\\nFirst appeared: February 0000; 00 years ago\\nParadigm: Multi-paradigm: object-oriented, procedural (imperative), functional, structured, reflective\", 30)\n" ] } ], "source": [ "print(re.subn('\\d', '0', s))" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python***is***an***interpreted***high***level***general***purpose***programming***language******Python***s***design***philosophy***emphasizes***code***readability***with***its***notable***use***of***significant***indentation******Wikipedia***Developer******Python***Software***Foundation***Stable***release******3***9***5*********3***May***2021******29***days***ago***Preview***release******3***10***0b1*********3***May***2021******29***days***ago***Typing***discipline******Duck******dynamic******strong***typing******gradual******since***3***5******but***ignored***in***CPython******First***appeared******February***1991******30***years***ago***Paradigm******Multi***paradigm******object***oriented******procedural******imperative*********functional******structured******reflective\n" ] } ], "source": [ "print(re.sub('\\W', '***', s))" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "****** ** ** *********** ********** *************** *********** ********* ******** ****** ********** ********** **** *********** **** *** ******* *** ** *********** ************ *********\n", "********** ****** ******** **********\n", "****** ******** ***** * * *** ***** ** **** ***\n", "******* ******** ******** * * *** ***** ** **** ***\n", "****** *********** ***** ******** ****** ******* ******* ****** **** *** ******* ** ********\n", "***** ********* ******** ***** ** ***** ***\n", "********* *************** **************** ********** ************* *********** *********** **********\n" ] } ], "source": [ "print(re.sub('\\S', '*', s))" ] }, { "cell_type": "code", "execution_count": 131, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('****** ** ** *********** ********** *************** *********** ********* ******** ****** ********** ********** **** *********** **** *** ******* *** ** *********** ************ *********\\n********** ****** ******** **********\\n****** ******** ***** * * *** ***** ** **** ***\\n******* ******** ******** * * *** ***** ** **** ***\\n****** *********** ***** ******** ****** ******* ******* ****** **** *** ******* ** ********\\n***** ********* ******** ***** ** ***** ***\\n********* *************** **************** ********** ************* *********** *********** **********', 492)\n" ] } ], "source": [ "print(re.subn('\\S', '*', s))" ] }, { "cell_type": "code", "execution_count": 132, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(\"Pyth*n *s *n *nt*rpr*t*d h*gh-l*v*l g*n*r*l-p*rp*s* pr*gr*mm*ng l*ng**g*. Pyth*n's d*s*gn ph*l*s*phy *mph*s*z*s c*d* r**d*b*l*ty w*th *ts n*t*bl* *s* *f s*gn*f*c*nt *nd*nt*t**n. W*k*p*d**\\nD*v*l*p*r: Pyth*n S*ftw*r* F**nd*t**n\\nSt*bl* r*l**s*: 3.9.5 / 3 M*y 2021; 29 d*ys *g*\\nPr*v**w r*l**s*: 3.10.0b1 / 3 M*y 2021; 29 d*ys *g*\\nTyp*ng d*sc*pl*n*: D*ck, dyn*m*c, str*ng typ*ng; gr*d**l (s*nc* 3.5, b*t *gn*r*d *n CPyth*n)\\nF*rst *pp**r*d: F*br**ry 1991; 30 y**rs *g*\\nP*r*d*gm: M*lt*-p*r*d*gm: *bj*ct-*r**nt*d, pr*c*d*r*l (*mp*r*t*v*), f*nct**n*l, str*ct*r*d, r*fl*ct*v*\", 163)\n" ] } ], "source": [ "print(re.subn('[aeiou]', '*', s))" ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('****o* i* a* i**e***e*e* *i****e*e* *e*e*a***u**o*e **o**a**i** *a**ua*e* ****o*** *e*i** **i*o*o*** e***a*i*e* *o*e *ea*a*i*i** *i** i** *o*a**e u*e o* *i**i*i*a** i**e**a*io** *i*i*e*ia\\n*e*e*o*e** ****o* *o***a*e *ou**a*io*\\n**a**e *e*ea*e* ***** * * *a* ***** ** *a** a*o\\n**e*ie* *e*ea*e* ******** * * *a* ***** ** *a** a*o\\n***i** *i**i**i*e* *u*** ***a*i** ***o** ***i*** **a*ua* **i**e **** *u* i**o*e* i* *****o**\\n*i*** a**ea*e** *e**ua** ***** ** *ea** a*o\\n*a*a*i*** *u**i**a*a*i*** o**e***o*ie**e** **o*e*u*a* *i**e*a*i*e** *u***io*a** ***u**u*e** *e**e**i*e', 329)\n" ] } ], "source": [ "print(re.subn('[^aeiou\\s]', '*', s))" ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [], "source": [ "d1 = 'abc'\n", "d2 = 'a'" ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['abc']\n" ] } ], "source": [ "print(re.findall('[a-z][a-z]+', d2))" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "print(re.findall('[a-z][a-z]+', d2))" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a']\n" ] } ], "source": [ "print(re.findall('[a-z][a-z]*', d2))" ] }, { "cell_type": "code", "execution_count": 143, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'{abs:val}'" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'{abs:val}'" ] }, { "cell_type": "code", "execution_count": 145, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(\"Python@is@an@interpreted@high-level@general-purpose@programming@language.@Python's@design@philosophy@emphasizes@code@readability@with@its@notable@use@of@significant@indentation.@Wikipedia@Developer:@Python@Software@Foundation@Stable@release:@3.9.5@/@3@May@2021;@29@days@ago@Preview@release:@3.10.0b1@/@3@May@2021;@29@days@ago@Typing@discipline:@Duck,@dynamic,@strong@typing;@gradual@(since@3.5,@but@ignored@in@CPython)@First@appeared:@February@1991;@30@years@ago@Paradigm:@Multi-paradigm:@object-oriented,@procedural@(imperative),@functional,@structured,@reflective\", 73)\n" ] } ], "source": [ "print(re.subn('[^aeiou\\S]','@',s))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\s -> [^\\S]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }