Example 5: input file formats

This example shows the 3 types of file formats that sportran can read.

  • ``table``: a plain-text file where data is organized in columns.

  • ``dict``: a Numpy binary file that contains a dictionary.

  • ``lammps``: a LAMMPS output log file

[1]:
import numpy as np
# import scipy as sp
# import matplotlib.pyplot as plt
try:
    import sportran as st
except ImportError:
    from sys import path
    path.append('..')
    import sportran as st

# c = plt.rcParams['axes.prop_cycle'].by_key()['color']

data_path = './data/Silica/'

1. table format

A table-type file is a plain-text file containing multiple time series.

  • Each row represents a data point (a specific time), and each row represents a different time series (i.e. a variable, e.g. temperature, energy, heat flux, …).

  • The first line of each column contains the name of the variable. Cartesian component of vector quantities can be expressed with square brackets (e.g. vector[1]).

  • Similarly to LAMMPS conventions, c_ or v_ prefixes of column names are ignored, so e.g. c_flux will be saved as flux.

  • Comment lines start with #.

For further information, see the documentation of TableFile.

[2]:
# Example
!head -n 15 ./data/Silica/Silica.dat
# Solid Silica - BKS potential, melted and quenched
# 216 atoms, T~1000K, dens~2.295g/cm^3
# NVE, dt = 1.0 fs, 100 ps, print_step = 1.0 fs
# Temperature = 983.172635 K, Volume = 3130.431110818 A^3
# LAMMPS metal units
Temp c_flux1[1] c_flux1[2] c_flux1[3]
998.48171 -265.30586 1520.6107 67.461829
1003.699 -168.68352 1377.4459 101.82146
1003.8906 -93.688306 1180.375 117.20939
998.1473 -42.571972 932.96168 111.11515
986.48517 -15.323416 642.52765 85.389352
969.86291 -10.876607 319.90865 45.695167
950.03861 -27.873411 -21.428315 -0.1944876
929.29852 -64.46361 -366.51677 -44.776231
910.08762 -117.84517 -700.11875 -82.966928
[3]:
# Load table file
f = st.i_o.TableFile(data_path + '/Silica.dat', group_vectors=True)
# Solid Silica - BKS potential, melted and quenched
# 216 atoms, T~1000K, dens~2.295g/cm^3
# NVE, dt = 1.0 fs, 100 ps, print_step = 1.0 fs
# Temperature = 983.172635 K, Volume = 3130.431110818 A^3
# LAMMPS metal units
Temp c_flux1[1] c_flux1[2] c_flux1[3]
 #####################################
  all_ckeys =  [('Temp', [0]), ('flux1', array([1, 2, 3]))]
 #####################################
Data length =  100001
[4]:
# list of available keys (column names) and their column indexes
print(f.all_ckeys)
{'Temp': [0], 'flux1': array([1, 2, 3])}
[5]:
# read the file loading the following columns
data = f.read_datalines(
    NSTEPS = 0,  # read all the steps
    select_ckeys = ['Temp', 'flux1'],  # read only these columns
)
  ckey =  [('Temp', [0]), ('flux1', array([1, 2, 3]))]
    step =    100000 - 100.00% completed
  ( 100000 ) steps read.
DONE.  Elapsed time:  0.7381811141967773 seconds
[6]:
# data can be also retrieved from the f.data dictionary
f.data
[6]:
{'Temp': array([[ 998.48171],
        [1003.699  ],
        [1003.8906 ],
        ...,
        [ 967.21723],
        [ 978.47566],
        [ 985.41455]]),
 'flux1': array([[ -265.30586 ,  1520.6107  ,    67.461829],
        [ -168.68352 ,  1377.4459  ,   101.82146 ],
        [  -93.688306,  1180.375   ,   117.20939 ],
        ...,
        [ 1226.9778  ,   212.0939  , -1126.4643  ],
        [ 1223.8753  ,   186.93836 ,  -881.39541 ],
        [ 1232.7723  ,   141.30647 ,  -620.41895 ]])}
[7]:
TEMPERATURE = np.mean(data['Temp'])
flux = data['flux1']
print(flux.shape)
(100000, 3)
[8]:
# cell and volume information can be set manually or retrieved from
# a LAMMPS data file (written using the `write_data` command)
box, VOLUME = st.i_o.read_lammps_datafile.get_box(data_path + '/lammps/silica_216_1000K.init')
[9]:
# we can finally create a HeatCurrent:
j = st.HeatCurrent(flux, units='metal', TEMPERATURE=TEMPERATURE, VOLUME=VOLUME, DT_FS=1.0)
Using single component code.

Data contained in TableFile can be converted to a numpy binary file, that can be reloaded faster at a later time:

[10]:
np.save('new_data.npy', f.data)

Side note: File is read sequentially, so you can actually read the first and second 1000 lines like this

[11]:
f.gotostep(0)  # go back to the step number 0

f.read_datalines(NSTEPS=1000)  # read first 1000 steps
first_block = f.data

f.read_datalines(NSTEPS=1000)  # read next 1000 steps
second_block = f.data
  ckey =  [('Temp', [0]), ('flux1', array([1, 2, 3]))]
  ( 1000 ) steps read.
DONE.  Elapsed time:  0.023930072784423828 seconds
  ckey =  [('Temp', [0]), ('flux1', array([1, 2, 3]))]
  ( 1000 ) steps read.
DONE.  Elapsed time:  0.01228475570678711 seconds

2. dict format

A dict-type file is a Numpy binary file that contains a dictionary.

It can be read simply like this:

[12]:
data = np.load(data_path + '/Silica.npy', allow_pickle=True).tolist()
data
[12]:
{'flux1': array([[ -265.30586 ,  1520.6107  ,    67.461829],
        [ -168.68352 ,  1377.4459  ,   101.82146 ],
        [  -93.688306,  1180.375   ,   117.20939 ],
        ...,
        [ 1226.9778  ,   212.0939  , -1126.4643  ],
        [ 1223.8753  ,   186.93836 ,  -881.39541 ],
        [ 1232.7723  ,   141.30647 ,  -620.41895 ]]),
 'Temperature': 983.1726353043,
 'Volume': 3130.431110818276,
 'DT_FS': 1.0,
 'units': 'lammps-metal'}
[13]:
# we can finally create a HeatCurrent:
j = st.HeatCurrent(
    data['flux1'], units='metal', TEMPERATURE=data['Temperature'], VOLUME=data['Volume'], DT_FS=data['DT_FS'])
Using single component code.

3. lammps format

A lammps-type file is a LAMMPS log file, i.e. the output generated by lammps (see `log command <https://docs.lammps.org/log.html>`__, by default it is called log.lammps.

The LammpsLogFile class can parse a lammps log file and convert data into a dictionary, and eventually save it as Numpy binary files.

As in a LAMMPS script there might be multiple run commands, we need to tell the parser which run it should read. We can do so by indicating a keyword string (called run_keyword) that it should look for. It skip all the lines until it finds this string, so it should be something evident.

For example, it can be an uppercase COMMENT LINE that you have placed just before the run command that launches the production run of your simulation. This is an example of a LAMMPS Log file where the NVE RUN keyword has been inserted:

[14]:
!tail -n +174 ./data/Silica/lammps/silica.out | head -n 15
# NVE RUN
fix          NVE_RUN all nve
run          100000
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Per MPI rank memory allocation (min/avg/max) = 4.015 | 4.018 | 4.022 Mbytes
   Step          Time           Temp          PotEng         KinEng         TotEng         Press        c_flux1[1]     c_flux1[2]     c_flux1[3]
         0   0              998.48171     -3245.1241      27.748737     -3217.3753     -4193.4348     -265.30586      1520.6107      67.461829
         1   0.001          1003.699      -3245.2719      27.893731     -3217.3782     -5450.2034     -168.68352      1377.4459      101.82146
         2   0.002          1003.8906     -3245.2786      27.899055     -3217.3795     -6701.621      -93.688306      1180.375       117.20939
         3   0.003          998.1473      -3245.1188      27.739443     -3217.3794     -7882.8782     -42.571972      932.96168      111.11515
         4   0.004          986.48517     -3244.7932      27.415341     -3217.3778     -8949.7826     -15.323416      642.52765      85.389352
         5   0.005          969.86291     -3244.328       26.953393     -3217.3746     -9878.8464     -10.876607      319.90865      45.695167
         6   0.006          950.03861     -3243.7731      26.402458     -3217.3707     -10669.51      -27.873411     -21.428315     -0.1944876
         7   0.007          929.29852     -3243.1925      25.826071     -3217.3664     -11337.595     -64.46361      -366.51677     -44.776231
         8   0.008          910.08762     -3242.6549      25.292183     -3217.3627     -11909.803     -117.84517     -700.11875     -82.966928
tail: error writing 'standard output': Broken pipe
[15]:
f = st.i_o.LAMMPSLogFile(data_path + '/lammps/silica.out', run_keyword='NVE RUN')
  run_keyword found at line 174.
  column headers found at line 179. Reading data...
 #####################################
  all_ckeys =  [('KinEng', array([4])), ('PotEng', array([3])), ('Press', array([6])), ('Step', array([0])), ('Temp', array([2])), ('Time', array([1])), ('TotEng', array([5])), ('flux1', array([7, 8, 9]))]
 #####################################
[16]:
# read the file loading the following columns
data = f.read_datalines(
    NSTEPS = 0,  # read all the steps
    select_ckeys = None,  # columns to be read, if None read them all!
)
  ckey =  [('KinEng', array([4])), ('PotEng', array([3])), ('Press', array([6])), ('Step', array([0])), ('Temp', array([2])), ('Time', array([1])), ('TotEng', array([5])), ('flux1', array([7, 8, 9]))]
    step =    100000 -  99.96% completed
  endrun_keyword found.
  Retaining an even number of steps (even_NSTEPS=True).
  ( 100000 ) steps read.
DONE.  Elapsed time:  1.8878519535064697 seconds
[17]:
data.keys()
[17]:
dict_keys(['Step', 'Time', 'Temp', 'PotEng', 'KinEng', 'TotEng', 'Press', 'flux1'])
[18]:
TEMPERATURE = np.mean(data['Temp'])
flux = data['flux1']
print(flux.shape)
(100000, 3)
[19]:
# cell and volume information can be set manually or retrieved from
# a LAMMPS data file (written using the `write_data` command)
box, VOLUME = st.i_o.read_lammps_datafile.get_box(data_path + '/lammps/silica_216_1000K.init')
[20]:
# we can finally create a HeatCurrent:
j = st.HeatCurrent(flux, units='metal', TEMPERATURE=TEMPERATURE, VOLUME=VOLUME, DT_FS=1.0)
Using single component code.

Finally, we can write the content of the LAMMPSLogFile into a numpy binary file, that can be reloaded faster at a later time:

[21]:
f.save_numpy_dict('new_data.npy',
                  select_ckeys=['Time', 'Temp', 'flux1'],
                  lammps_data_file=data_path + '/lammps/silica_216_1000K.init')
These keys will be saved in file "new_data.npy" :
  ['Temp_ave', 'Temp_std', 'Time', 'Temp', 'flux1', 'box', 'Volume', 'DT', 'DT_TIMEUNITS']

For further information, see the documentation of LAMMPSLogFile.