Example 5: input file formats¶
This example shows the 3 types of file formats that sportran can read.
``table``: a plain-text file where data is organized in columns.
``dict``: a Numpy binary file that contains a dictionary.
``lammps``: a LAMMPS output log file
[1]:
import numpy as np
# import scipy as sp
# import matplotlib.pyplot as plt
try:
import sportran as st
except ImportError:
from sys import path
path.append('..')
import sportran as st
# c = plt.rcParams['axes.prop_cycle'].by_key()['color']
data_path = './data/Silica/'
1. table
format¶
A table
-type file is a plain-text file containing multiple time series.
Each row represents a data point (a specific time), and each row represents a different time series (i.e. a variable, e.g. temperature, energy, heat flux, …).
The first line of each column contains the name of the variable. Cartesian component of vector quantities can be expressed with square brackets (e.g.
vector[1]
).Similarly to LAMMPS conventions,
c_
orv_
prefixes of column names are ignored, so e.g.c_flux
will be saved asflux
.Comment lines start with
#
.
For further information, see the documentation of TableFile.
[2]:
# Example
!head -n 15 ./data/Silica/Silica.dat
# Solid Silica - BKS potential, melted and quenched
# 216 atoms, T~1000K, dens~2.295g/cm^3
# NVE, dt = 1.0 fs, 100 ps, print_step = 1.0 fs
# Temperature = 983.172635 K, Volume = 3130.431110818 A^3
# LAMMPS metal units
Temp c_flux1[1] c_flux1[2] c_flux1[3]
998.48171 -265.30586 1520.6107 67.461829
1003.699 -168.68352 1377.4459 101.82146
1003.8906 -93.688306 1180.375 117.20939
998.1473 -42.571972 932.96168 111.11515
986.48517 -15.323416 642.52765 85.389352
969.86291 -10.876607 319.90865 45.695167
950.03861 -27.873411 -21.428315 -0.1944876
929.29852 -64.46361 -366.51677 -44.776231
910.08762 -117.84517 -700.11875 -82.966928
[3]:
# Load table file
f = st.i_o.TableFile(data_path + '/Silica.dat', group_vectors=True)
# Solid Silica - BKS potential, melted and quenched
# 216 atoms, T~1000K, dens~2.295g/cm^3
# NVE, dt = 1.0 fs, 100 ps, print_step = 1.0 fs
# Temperature = 983.172635 K, Volume = 3130.431110818 A^3
# LAMMPS metal units
Temp c_flux1[1] c_flux1[2] c_flux1[3]
#####################################
all_ckeys = [('Temp', [0]), ('flux1', array([1, 2, 3]))]
#####################################
Data length = 100001
[4]:
# list of available keys (column names) and their column indexes
print(f.all_ckeys)
{'Temp': [0], 'flux1': array([1, 2, 3])}
[5]:
# read the file loading the following columns
data = f.read_datalines(
NSTEPS = 0, # read all the steps
select_ckeys = ['Temp', 'flux1'], # read only these columns
)
ckey = [('Temp', [0]), ('flux1', array([1, 2, 3]))]
step = 100000 - 100.00% completed
( 100000 ) steps read.
DONE. Elapsed time: 0.7381811141967773 seconds
[6]:
# data can be also retrieved from the f.data dictionary
f.data
[6]:
{'Temp': array([[ 998.48171],
[1003.699 ],
[1003.8906 ],
...,
[ 967.21723],
[ 978.47566],
[ 985.41455]]),
'flux1': array([[ -265.30586 , 1520.6107 , 67.461829],
[ -168.68352 , 1377.4459 , 101.82146 ],
[ -93.688306, 1180.375 , 117.20939 ],
...,
[ 1226.9778 , 212.0939 , -1126.4643 ],
[ 1223.8753 , 186.93836 , -881.39541 ],
[ 1232.7723 , 141.30647 , -620.41895 ]])}
[7]:
TEMPERATURE = np.mean(data['Temp'])
flux = data['flux1']
print(flux.shape)
(100000, 3)
[8]:
# cell and volume information can be set manually or retrieved from
# a LAMMPS data file (written using the `write_data` command)
box, VOLUME = st.i_o.read_lammps_datafile.get_box(data_path + '/lammps/silica_216_1000K.init')
[9]:
# we can finally create a HeatCurrent:
j = st.HeatCurrent(flux, units='metal', TEMPERATURE=TEMPERATURE, VOLUME=VOLUME, DT_FS=1.0)
Using single component code.
Data contained in TableFile
can be converted to a numpy binary file, that can be reloaded faster at a later time:
[10]:
np.save('new_data.npy', f.data)
Side note: File is read sequentially, so you can actually read the first and second 1000 lines like this
[11]:
f.gotostep(0) # go back to the step number 0
f.read_datalines(NSTEPS=1000) # read first 1000 steps
first_block = f.data
f.read_datalines(NSTEPS=1000) # read next 1000 steps
second_block = f.data
ckey = [('Temp', [0]), ('flux1', array([1, 2, 3]))]
( 1000 ) steps read.
DONE. Elapsed time: 0.023930072784423828 seconds
ckey = [('Temp', [0]), ('flux1', array([1, 2, 3]))]
( 1000 ) steps read.
DONE. Elapsed time: 0.01228475570678711 seconds
2. dict
format¶
A dict
-type file is a Numpy binary file that contains a dictionary.
It can be read simply like this:
[12]:
data = np.load(data_path + '/Silica.npy', allow_pickle=True).tolist()
data
[12]:
{'flux1': array([[ -265.30586 , 1520.6107 , 67.461829],
[ -168.68352 , 1377.4459 , 101.82146 ],
[ -93.688306, 1180.375 , 117.20939 ],
...,
[ 1226.9778 , 212.0939 , -1126.4643 ],
[ 1223.8753 , 186.93836 , -881.39541 ],
[ 1232.7723 , 141.30647 , -620.41895 ]]),
'Temperature': 983.1726353043,
'Volume': 3130.431110818276,
'DT_FS': 1.0,
'units': 'lammps-metal'}
[13]:
# we can finally create a HeatCurrent:
j = st.HeatCurrent(
data['flux1'], units='metal', TEMPERATURE=data['Temperature'], VOLUME=data['Volume'], DT_FS=data['DT_FS'])
Using single component code.
3. lammps
format¶
A lammps
-type file is a LAMMPS log file, i.e. the output generated by lammps (see `log
command <https://docs.lammps.org/log.html>`__, by default it is called log.lammps
.
The LammpsLogFile class can parse a lammps log file and convert data into a dictionary, and eventually save it as Numpy binary files.
As in a LAMMPS script there might be multiple run
commands, we need to tell the parser which run
it should read. We can do so by indicating a keyword string (called run_keyword
) that it should look for. It skip all the lines until it finds this string, so it should be something evident.
For example, it can be an uppercase COMMENT LINE that you have placed just before the run
command that launches the production run of your simulation. This is an example of a LAMMPS Log file where the NVE RUN
keyword has been inserted:
[14]:
!tail -n +174 ./data/Silica/lammps/silica.out | head -n 15
# NVE RUN
fix NVE_RUN all nve
run 100000
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Per MPI rank memory allocation (min/avg/max) = 4.015 | 4.018 | 4.022 Mbytes
Step Time Temp PotEng KinEng TotEng Press c_flux1[1] c_flux1[2] c_flux1[3]
0 0 998.48171 -3245.1241 27.748737 -3217.3753 -4193.4348 -265.30586 1520.6107 67.461829
1 0.001 1003.699 -3245.2719 27.893731 -3217.3782 -5450.2034 -168.68352 1377.4459 101.82146
2 0.002 1003.8906 -3245.2786 27.899055 -3217.3795 -6701.621 -93.688306 1180.375 117.20939
3 0.003 998.1473 -3245.1188 27.739443 -3217.3794 -7882.8782 -42.571972 932.96168 111.11515
4 0.004 986.48517 -3244.7932 27.415341 -3217.3778 -8949.7826 -15.323416 642.52765 85.389352
5 0.005 969.86291 -3244.328 26.953393 -3217.3746 -9878.8464 -10.876607 319.90865 45.695167
6 0.006 950.03861 -3243.7731 26.402458 -3217.3707 -10669.51 -27.873411 -21.428315 -0.1944876
7 0.007 929.29852 -3243.1925 25.826071 -3217.3664 -11337.595 -64.46361 -366.51677 -44.776231
8 0.008 910.08762 -3242.6549 25.292183 -3217.3627 -11909.803 -117.84517 -700.11875 -82.966928
tail: error writing 'standard output': Broken pipe
[15]:
f = st.i_o.LAMMPSLogFile(data_path + '/lammps/silica.out', run_keyword='NVE RUN')
run_keyword found at line 174.
column headers found at line 179. Reading data...
#####################################
all_ckeys = [('KinEng', array([4])), ('PotEng', array([3])), ('Press', array([6])), ('Step', array([0])), ('Temp', array([2])), ('Time', array([1])), ('TotEng', array([5])), ('flux1', array([7, 8, 9]))]
#####################################
[16]:
# read the file loading the following columns
data = f.read_datalines(
NSTEPS = 0, # read all the steps
select_ckeys = None, # columns to be read, if None read them all!
)
ckey = [('KinEng', array([4])), ('PotEng', array([3])), ('Press', array([6])), ('Step', array([0])), ('Temp', array([2])), ('Time', array([1])), ('TotEng', array([5])), ('flux1', array([7, 8, 9]))]
step = 100000 - 99.96% completed
endrun_keyword found.
Retaining an even number of steps (even_NSTEPS=True).
( 100000 ) steps read.
DONE. Elapsed time: 1.8878519535064697 seconds
[17]:
data.keys()
[17]:
dict_keys(['Step', 'Time', 'Temp', 'PotEng', 'KinEng', 'TotEng', 'Press', 'flux1'])
[18]:
TEMPERATURE = np.mean(data['Temp'])
flux = data['flux1']
print(flux.shape)
(100000, 3)
[19]:
# cell and volume information can be set manually or retrieved from
# a LAMMPS data file (written using the `write_data` command)
box, VOLUME = st.i_o.read_lammps_datafile.get_box(data_path + '/lammps/silica_216_1000K.init')
[20]:
# we can finally create a HeatCurrent:
j = st.HeatCurrent(flux, units='metal', TEMPERATURE=TEMPERATURE, VOLUME=VOLUME, DT_FS=1.0)
Using single component code.
Finally, we can write the content of the LAMMPSLogFile
into a numpy binary file, that can be reloaded faster at a later time:
[21]:
f.save_numpy_dict('new_data.npy',
select_ckeys=['Time', 'Temp', 'flux1'],
lammps_data_file=data_path + '/lammps/silica_216_1000K.init')
These keys will be saved in file "new_data.npy" :
['Temp_ave', 'Temp_std', 'Time', 'Temp', 'flux1', 'box', 'Volume', 'DT', 'DT_TIMEUNITS']
For further information, see the documentation of LAMMPSLogFile.