How to Automate XML and YML and CSV files using Python

Reading and Writing a YML file using python

yaml and yml files are superset of JSON. Some of the automation tools such as ansible uses yaml based files, referred to as playbooks, to define actions you want to automate. These playbooks use the YAML format.

Working with yaml files is a fun in python , so lets get started but In order to work with yaml files in python you would require to install a PyYAML library as Python doesn’t contain standard library. PyYAML is a YAML parser and emitter for Python.

  • Run the following command to install PyYAML library in your favorite code editor terminal such as visual code studio.
pip install PyYAML
  • Next, create a folder with a name Python and under that create a simple YML file and name it as apache.yml and paste the below content and save it.
- hosts: webservers
    http_port: 80
    max_clients: 200
  remote_user: root
  - name: ensure apache is at the latest version
      name: httpd
      state: latest
  • Next, create another file in same Pythonfolder and name it as and paste the below python code.

Below Python script imports yaml module to work with yaml files and pprint module to get a output in well designed pattern. Next, using open() function it opens the apache.yml file and reads the data using yaml.safe_load() method. Later, using yaml.dump() you can add or write the data into it. As we are not adding any data into it , the output would result as NONE

import yaml
from pprint import pprint

with open('apache.yml', 'r') as new_file:
     verify_apache = yaml.safe_load(new_file)

with open('apache.yml', 'w') as new_file2:
     verify_apache2 = yaml.dump(verify_apache, new_file2)

  • Execute the above python script using python command and you should see the below output.
[{'hosts': 'webservers',
  'remote_user': 'root',
  'tasks': [{'name': 'ensure apache is at the latest version',
             'yum': {'name': 'httpd', 'state': 'latest'}}],
  'vars': {'http_port': 80, 'max_clients': 200}}]

Reading and Writing a XML file using python

XML files used mostly for structured data. Many web system uses XML to transfer data and one of them is RSS ( Real Simple Syndication) feeds which helps in finding the latest updates on websites from various sources. Python offers XML Library.

  • Next, in the same Python folder create a simple XML file and name it as book.xml and paste the below content and save it. XML has a tree like structure and top element is known as root and rest of them are elements
<?xml version="1.0"?>
   <book id="bk109">
      <title>Automate Infra Part 2</title>
      <genre>Science Fiction</genre>
   <book id="bk112">
      <title>Automate Infra Part 1</title>

  • Next, create another file in same python folder and name it as and paste the below python code.
  • In below script, importing xml.etree.ElementTree module helps to work with xml files and implements a simple and efficient API for parsing and creating XML data. Next, In XML file entire tree is parsed that is it reads the book.xml file and then prints the content inside it.

    import xml.etree.ElementTree as ET   
    tree = ET.parse('book.xml')    # checking each elements
    root = tree.getroot()          # finding the root 
    for child in root:                    # Each child and its attributes

  • Execute the above python script using python command and you should see the below output.
  • O/P:
    catalog {}
    book {'id': 'bk109'}
    book {'id': 'bk112'}

    Reading and Writing a comma-separated values (CSV) file using python

    CSV is most widely used spreadsheets. To work with these file in python you need to import the csv module. Lets learn how to read and write data into CSV.

  • Next, in the same Python folder create a CSV file and name it as devops.csv and add the content similar to below in your file and save it.
    • Next, create another file in same python folder and name it as and paste the below python code.

    Below script uses csv module to work with csv files. As soon as python script is executed , open() function opens the csv file and then using csv.reader() it reads it and then prints the rows according to the defined range.

    import csv
    with open('devops.csv' , 'r') as csv_file:
        read = csv.reader(csv_file,  delimiter=',')
        for _ in range(5):
    • Execute the above python script using python command and you should see the below output.
    ['Date', ' PreviousUserCount', ' UserCountTotal', ' sitepage']
    ['02-01-2021', '61', '5336', '']
    ['03-01-2021', '42', '5378', '']
    ['04-01-2021', '26', '5404', '']
    ['05-01-2021', '65', '5469', '']
    <_csv.reader object at 0x0336A370>

    Python – Pandas (Data Analysis 3rd Party Library)

    pandas.DataFrame, which acts like a data table, similar to a very powerful spreadsheet. If you want to work on something like row or column in Spreadsheet then DataFrames is the tool for you. So lets get started by installing pip install pandas

    import pandas as pd
    df = pd.read_csv('devops.csv')
    print(df.head(4)) # Seeing TOP 4 rows in devops.csv file
    print(df.describe()) # Statical View 
    <class 'pandas.core.frame.DataFrame'>
             Date   PreviousUserCount   UserCountTotal                  sitepage
    0  02-01-2021                  61             5336
    1  03-01-2021                  42             5378
    2  04-01-2021                  26             5404
    3  05-01-2021                  65             5469
            PreviousUserCount   UserCountTotal
    count            4.000000         4.000000
    mean            48.500000      5396.750000
    std             18.046237        55.721779
    min             26.000000      5336.000000
    25%             38.000000      5367.500000
    50%             51.500000      5391.000000
    75%             62.000000      5420.250000
    max             65.000000      5469.000000

    PYTHON : Regular Expressions to Search Text ( * MOSTLY USED AND IMPORTANT)

    BEST TWO EXAMPLE OF SEARCHING [Can be used for Different Practices such as Analysis, HR, Sales Team and many more ]

    name_list = '''Ezra Sharma <>,
       ...: Rostam Bat   <>,
       ...: Chris Taylor <,
       ...: Bobbi Baio <'''
    # Some commonly used ones are \w, which is equivalent to [a-zA-Z0-9_] and \d, which is equivalent to [0-9]. 
    # You can use the + modifier to match for multiple characters:
    print('Rostam', name_list))
    print('[RB]obb[yi]',  name_list))
    print('Chr[a-z][a-z]', name_list))
    print('[A-Za-z]+', name_list))
    print('[A-Za-z]{5}', name_list))
    print('[A-Za-z]{7}', name_list))
    print('[A-Za-z]+@[a-z]+\.[a-z]+', name_list))
    print('\w+', name_list))
    print('\w+\@\w+\.\w+', name_list))
    print('(\w+)\@(\w+)\.(\w+)', name_list))
    <re.Match object; span=(49, 55), match='Rostam'>
    <re.Match object; span=(147, 152), match='Bobbi'>
    <re.Match object; span=(98, 103), match='Chris'>
    <re.Match object; span=(0, 4), match='Ezra'>
    <re.Match object; span=(5, 10), match='Sharm'>
    <re.Match object; span=(13, 20), match='esharma'>
    <re.Match object; span=(13, 38), match=''>
    <re.Match object; span=(0, 4), match='Ezra'>
    <re.Match object; span=(13, 38), match=''>
    <re.Match object; span=(13, 38), match=''>
    # <IP Address> <Client Id> <User Id> <Time> <Request> <Status> <Size>
    Line1 = ' - Automateinfra1 [13/Nov/2021:14:43:30 -0800] "GET /assets/234 HTTP/1.0" 200 2326'
    access_log = ''' - Automateinfra1 [13/Nov/2021:14:43:30 -0800] "GET /assets/234 HTTP/1.0" 200 2326 - Automateinfra2 [13/Nov/2021:14:43:30 -0800] "GET /assets/235 HTTP/1.0" 200 2324 - Automateinfra3 [13/Nov/2021:14:43:30 -0800] "GET /assets/236 HTTP/1.0" 200 2325
    count_ip = r'(?P<IP>\d+\.\d+\.\d+\.\d+)'
    count_time = r'(?P<Time>\d\d/\w{3}/\d{4}:\d{2}:\d{2}:\d{2})'
    count_clientid = r'(?P<User>".+")'
    count_request = r'(?P<Request>".+")'
    sol ='(?P<IP>\d+\.\d+\.\d+\.\d+)', Line1 )
    print( , Line1))
    print( , Line1))
    value = re.finditer(count_ip, access_log)
    for sol in value:
    <re.Match object; span=(56, 82), match='"GET /assets/234 HTTP/1.0"'>
    <re.Match object; span=(28, 48), match='13/Nov/2021:14:43:30'>


    Rather than loading the whole file into memory as you have done up until now, you can read one line at a time, process the line, and then move to the next. The lines are removed from memory automatically by Python’s garbage collector, freeing up memory.

    with open("devops.txt",mode="r") as mynewfile:       # if you open any binary file such as pdf keep w as wb
        with open("devops-corrected.txt", "w") as target_file:
            for line in mynewfile:
    # FILE BREAKER with chunk of data with number of bytes 
    with open('book.xml' , 'rb') as sourcefile:
        while True:
            chunk =  # break down in 1024 bytes
            if chunk:
    b'<?xml version="1.0"?>\r\n<catalog>\r\n   <book id="bk109">\r\n      <author>Author1</author>\r\n      <title>Automate Infra Part 2</title>\r\n      <genre>Science Fiction</genre>\r\n      <price>6.95</price>\r\n    
      <publish_date>2000-11-02</publish_date>\r\n      <description>book1</description>\r\n   </book>\r\n   <book id="bk112">\r\n      <author>Author2</author>\r\n      <title>Automate Infra Part 1</title>\r\n      <genre>Computer</genre>\r\n      <price>49.95</price>\r\n      <publish_date>2001-04-16</publish_date>\r\n      <description>book2</description>\r\n   </book>\r\n</catalog>'


    There are many times you need to encrypt text to ensure security. In addition to Python’s built-in package hashlib, there is a widely used third-party package called cryptography

    HASHLIB: Uses Hash Function and based on SHA1, SHA224, SHA384, SHA512, and RSA’s MD5 Algorithms


    symmetric key encryption: Its based on shared keys. These algorithms include Advanced Encryption Algorithm (AES), Blowfish, Data Encryption Standard (DES), Serpent, and Twofish

    asymmetric key encryption: Its based on public keys ( which are widely shared ) and private keys which is kept secretly

    # Encryption using HashLib
    import hashlib                  # Python Built in Package
    line = "I like editing"
    bline = line.encode()       # Converting into Binary string
    print(bline)                  # Print the converted Binary string
    algo = hashlib.md5()            # Using the secure alogorithm using haslib object
    algo.update(bline)            # Applying the secure alogorithm
    print("Encrypted  text Message")
    print(algo.digest())            # Print the Encypted string
    # Encryption using Cryptography (Symmetric key encryption)
    from cryptography.fernet import Fernet  # Third Party Package So you would need pip install cryptography
    key = Fernet.generate_key()             # Generating the keys
    print("Generating the keys ")
    print(key)                              # Prining the keys
    algo = Fernet(key) # Using the key AES alogo using Fenet object
    message = b"I definetely like Editing"
    encrypted = algo.encrypt(message)
    print("Encrypted  text Message ")
    # Encryption using Cryptography (ASymmetric key encryption)
    from cryptography.hazmat.backends import default_backend
    from cryptography.hazmat.primitives.asymmetric import padding ,rsa
    from cryptography.hazmat.primitives import hashes
    private_key = rsa.generate_private_key(public_exponent=65537,key_size=4096,backend=default_backend())  # Generating the Private Key
    print(private_key)   # Printing  the Private Key
    public_key = private_key.public_key()   # Generating the Public Key
    print(public_key)    # Printing  the Public  Key
    message = b"I am equally liking Editing"
    encrypted = public_key.encrypt(message,padding.OAEP(mgf=padding.MGF1(algorithm=hashes.SHA256()), algorithm=hashes.SHA256() , label=None))
    decrypted = private_key.decrypt(encrypted,padding.OAEP(mgf=padding.MGF1(algorithm=hashes.SHA256()), algorithm=hashes.SHA256(), label=None))
    b'I like editing'
    Encrypted  text Message
    Generating the keys
    Encrypted  text Message
    b'I definetely like Editing'
    <cryptography.hazmat.backends.openssl.rsa._RSAPrivateKey object at 0x036491D8>
    <cryptography.hazmat.backends.openssl.rsa._RSAPublicKey object at 0x03850E38>
    b"\x8b\xec\xb0\x91\xec\xe7\x8d;\x11\xbclch\xbdVD@c\xd3J\x07'\xe9\x07\x15\x1c@=^\xd2h\xcaDL\x95\xea[\x0fv\x012\xed\xd5\xed\x0e\x9b\x93V2\x00\xba\x9c\x07\xba\x8b\xf3\xcb\x03M\xa8\xb1\x12ro\xae\xc0\xfb$\xf9\xcc\x85\xe8s\xfc`{\xfe{\x88\xd2\xc3\xffI\x90\xe3\xd2\x1e\x82\x95\xdfe<\xd5\r\x0b\xc4z\xc4\xf7\x00\xcfr\x07npm0\xd4\xc4\xa4>w\x9d]\xcf\xae7F\x91&\x93\xd5\xda\xcaR\x13A\x8ewB\xf6\xd9\xae\xce\xca\x8f\xd6\x91\x06&:\x00\xa0\x84\x05#,\x7fdA\x87\xb2\xe7\x1d\x8b*\xa15\xf8\xb0\x07\xa0n\x1e\xeaI\x02\xbaA\x88ut\x8e\x82<\xfe\xbfM\xe6F\xa3\xcc\xd4\x8b\x80PY\xb5\xd3\x14}C\xe2\x83j\xaf\x85\xa6\x9e\x19\xb2\xd9\xb8\xac\xa4\xfb\x1f\x0c\xce\x9d4\x82\x1e\xfd5\xb49\xa5\xbbL\x01~\x8fA\xee\r\xc7\x84\x9e\x0c\t\x15z\r\xfd]\x0b\xcfW\x01\xd2\x16\x17btc\xeaSl\xf5\xb0\x8a\xe2X\xe7\xa7a\xa7\xf7M\x01\xa2\x0b8\xd6\xf2\xc5c\xbf\xea\xe0\x80\x15\xde-\x98\xa1\xc8ud*\xbel2\xb5\xc8:\x92\xd5\r(_8\xbd\xcb\x80\xf1\x93\x83\xe2\x9f\xed\x82f\xd0\xb2\x8f\x1b\x9eMC\x07\xf9\x08\xb0\x00QA\xea\x93\xc7@&\x84\xff<\xde\x80@\xc8\xc6\x83O&%\x91r-\xb0\xef}\x18tU{C\xa6\x17\x97\x1b\x95g\xc5\x0e>{\xb0\x94a)\xbc)*Sq\x98\xad\xf3>\x04\x9b+x\x95&\xa6\xe6,\xb4~\xf2Y\x06,\xab'uq \x9f0\x7f\xb5\xd50\xbdp\xbb\xdf\x1c\xe9\xb1\xc4\x88y\nq\\\x85\x1e\xd8\x18M\x87\x1aU.\x918;\xcd\x10 \x9b\x11\xf9R\xd3\x8fz\xe8\xf6|C\xfb\x1f\xfd1\x19\x10:>\x1c\x06\x8e\xda\x98\xb2\xf3aa^\xa54\x03\xf8\x03\xc4\xe6\xd9mw\r\x8b\x96\xa2rJ\x03\xe7\xda\x0f\rJ-iPo!^\x8a\xdcg\x8c!L\xa4\xedY\xe5\x12\xdf\xe8\xe7\x0cE\xcd\xa2\xa2Gr\xc0\xe1\xa6\xc5\x9a\x9f\x07\x89\x84\x8b\xb7"
    b'I am equally liking Editing'


    This module will help to connect with many low level operating system calls and offers connectivity between multi -OS like Unix and Windows.

    import os   # Python Built in Package
    print(os.listdir('.'))   # List the directories
    os.chmod('automateinfra.txt',777) # Add the permissions to the file.
    os.mkdir('/tmp/automateinfra.pdf') # Make the directory
    os.rmdir('/tmp/automateinfra.pdf') # remove the directory
    os.stat('b.txt')  #  These stats include st_mode, the file type and permissions, and st_atime, the time the item was last accessed.
    cur_dir = os.getcwd()  # Get the current working directory.
    print(os.path.dirname(cur_dir))   # Returns the Parent Directory Path
    print(os.path.split(cur_dir))     # Gives structure from Parent Directory
    print(os.path.basename(cur_dir))  # Returns Base Directory 
    while os.path.basename(cur_dir):   # Until Base Path directory is true , keep continuing 
        cur_dir = os.path.dirname(cur_dir)  # Prints the base directory and all above parents Directory
    ('C:\\Users\\AutomateInfra\\Desktop\\GIT\\Python-Desktop', 'Basics')
    import os
    # Check the current working directory
    file_name = "automateinfra.txt"
    file_path = os.path.join(os.getcwd(), file_name)  
    print(f"Checking {file_path}")
    if os.path.exists(file_path):
    # Check user home directory
    home_dir = os.path.expanduser("~/") #expanduser function to get the path to the user’s home directory.
    file_path = os.path.join(home_dir,file_name)
    print(f"Checking {file_path}")
    if os.path.exists(file_path):
    Checking C:\Users\Automateinfra\Desktop\GIT\Python-Desktop\Basics\automateinfra.txt
    Checking C:\Users\Automateinfra/automateinfra.txt


    Leave a Reply

    Fill in your details below or click an icon to log in: Logo

    You are commenting using your account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s