Thursday, January 30, 2020

Creating new Solr Core

What is Solr Core ?


Solr core is basically an index of the text and fields found in documents that we publish to Solr.
single Solr instance can contain multiple cores, which are separate from each other based on local criteria.
In my project, If i have 2 databases and I may decide to use separate core for each database.
Each core will have its own configuration and indexed data storage location.
When the Solr server runs in a standalone mode, this configuration is called Core
When the solr server runs in a Cloud mode, this configuration is called Collection

This core folder will contain all the configurations and indexed data.

Let’s create a Core configuration for Standalone server

We can create the Core using the solr create command
Navigate to the solr-6.2.0\bin folder in command prompt
And run the below command
solr create –c MyCore
Now navigate to the solr-6.2.0\server\solr
We can see MyCore folder is created and this folder can be used as configuration folder for our indexing.

We can see 2 folders called conf(used for configuration) and data(used for storing indexed data) folders and core.properties file which contains the name of the core inside the MyCore folder.


Lets Understand Core concept and make some changes in MyCore

Every core needs to have 4 important files listed below
1)solr.xml
2)solrconfig.xml
3)schema.xml
4)core.properties
1)solr.xml

This file will be in the same place where our new MyCore folder is located.
This file should be used to configure the Solr Cores.
2)solrconfig.xml

This file will be automatically created when we run Solr Create command and available inside MyCore/conf directory.
This file is used to configure the Solr server in a high level
For example, we can change the location of data directory in this file and Lucene details are added in this file.

At the end of this file before < /config > tag add the below line

<schemaFactory class="ClassicIndexSchemaFactory"/>
This enables to use Schema mode where we can use schema.xml for manually editing the schema xml file for defining our own field types description.

3)schema.xml

The file will be generated as managed-schema.xml when we create a new core as by default Solr uses Schemaless mode.
Managed-schema.xml is available inside MyCore/conf directory.
Rename this file to schema.xml as we have done the configuration change in solrconfig.xml to use Schema mode
.
Managed-schema.xml should be used if we are using schemaless mode which is by default enabled in Solr.
This file contains the description of the fields that we pass to solr for indexing.
4)core.properties

This file will be automatically created when we run Solr Create command and available inside MyCore directory.
This file is used to define the properties used by our Core like Core Name,solr config,schema file etc
If we don’t add any values inside this core.properties ,default values will be taken automatically.

Since we have changed some of the configuration details, restart the Solr server.

Navigate to below path in command prompt
solr-6.2.0\bin
Stop the Solr server using below command
Solr stop –all
Start the Solr server now using below command
Solr start
Let’s access the solr through web using below url and check whether core is listed or not in the core selector

What is Indexing?


Indexing is the process of arranging the data in a more systematic and efficient way to locate the information in a document in a much faster way.


Let’s understand Indexing with the below examples


Example 1

Assume that I have the following table in the DB which stores Person’s details
Person Table
If we want to fetch the records whose last name is Dravid then It scans each and every row to match for the last name as Dravid , if matches it will add that to the result set.
This requires to go through each and every row even though that row will have a different lastname than Dravid.
Don’t you think it takes more time as it has to scan though unwanted rows ?
Yes it will take more time for sure as it is going through each and every row.
Now Observe the below data

Here we have arranged the table data in the order of last name in alphabetical order
Now when we search for a last name as Dravid we can identify the right row based on the alphabetical order and then get the result accordingly.
Here it is not required to go through all the rows to search Dravid because we know that LastName is arranged in alphabetical order.
It has improved the performance.
If we have 100000 rows, then it improves the performance drastically while searching.

Example 2

Another example would be Text Books which has index in it.
Assume we want to search the Chapter called Brave Man and if we don’t have index defined at the beginning of the book then it is very difficult to search that chapter and also it takes more time.
If we have index defined like below
ChapterName :page number
Then we can easily search the Chapter name and get the page number from the index using which we can easily open the chapter in the Book without much time.
So in both the cases, with Indexing we are increasing the performance of Searching.
This is what actually needed by many websites especially ecommerce sites where people do searching a lot.
This way of representing the data in a more efficient way to make the search faster is called Indexing
There are many frameworks available in the market which helps to achieve indexing and also provides lot more features along with indexing like faceted navigation,Hit Highlighting,caching etc.
Some of such frameworks available in the market are Solr,Sphinx,elastic search,Algolia ,Swiftype etc.
Each framework will have their own way of indexing the data published to it.
Solr is the most widely used open source Search server with a current run-rate of over 6,000 downloads a day and installed at 4000 companies as per the Solr wiki statistics at the time of posting it
check the below link to read the same

Solr overview & Setup

Solr Overview


Solr is an open source software developed by Apache software foundation
It is a search server which uses the Apache Lucene in the backend and provides a Rest API which can be called from any language or the platform to get the indexed data or the search results.
Apache Lucene is the java library which provides indexing and search functionality.
Solr and Lucene both are managed by Apache.
Applications can use this search platform called solr to implement faster searching in their site.

Download and setup the Solr server


We can download solr from the below url
Click here to download Solr









Click on highlighed link in the above image.
Select solr zip folder from this page for downloading.
Once download completes,Unzip the zipped folder

Installing Java

We must have Java installed on the machine to work with Solr(Because Solr is developed in Java)
So if not please install JDK 8 from the below url

How to start solr server ?

Open command prompt and go to the bin folder of solr unzipped directory in your system
if you are in any other drive, switch to the drive where solr is downloaded using the command Drive: example D: to switch to D drive.
In my system , I have Solr downloaded in D drive.
cd D:\solr-6.2.0\bin
Run the below command
Solr start
Now it displays that solr server is started as below